什么是pyQuery:
强大又灵活的网页解析库。如果你觉得正则写起来太麻烦(我不会写正则),如果你觉得 BeautifulSoup的语法太难记,如果你熟悉JQuery的语法,那么PyQuery就是你最佳的选择。
pyQuery的安装pip3 install pyquery即可安装啦。
pyQuery的基本用法:
初始化:
字符串初始化:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
                
                  
                    The Dormouse's story
                  
                
                
                  Once upon a time there were three little sisters;and thier names were
                  
                                      
                  
                    Lacie
                  
                   and
                  
                    Title
                  
                  ; and they lived at the boottom of a well.
                
                
                  ...
                
               
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
            
            
              print
            
            (doc(
            
              '
            
            
              a
            
            
              '
            
            ))
          
        运行结果:
URL初始化:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               URL初始化
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            = pq(
            
              '
            
            
              http://www.baidu.com
            
            
              '
            
            
              )
            
            
              print
            
            (doc(
            
              '
            
            
              input
            
            
              '
            
            ))
          
        运行结果:
文件初始化:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               文件初始化
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            = pq(filename=
            
              '
            
            
              baidu.html
            
            
              '
            
            
              )
            
            
              print
            
            (doc(
            
              '
            
            
              title
            
            
              '
            
            ))
          
        运行结果:
选择方式和jquery一致,id、name、class都是如此,还有很多都和jquery一致。
基本CSS选择器:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               Css选择器
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
                
                  
                    The Dormouse's story
                  
                
                
                  Once upon a time there were three little sisters;and thier names were
                  
                                      
                  
                    Lacie
                  
                   and
                  
                    Title
                  
                  ; and they lived at the boottom of a well.
                
                
                  ...
                
               
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
            
            
              print
            
            (doc(
            
              '
            
            
              .title
            
            
              '
            
            ))
          
        运行结果:
查找元素:
子元素:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
                
                  
                    The Dormouse's story
                  
                
                
                  Once upon a time there were three little sisters;and thier names were
                  
                                      
                  
                    Lacie
                  
                   and
                  
                    Title
                  
                  ; and they lived at the boottom of a well.
                
                
                  ...
                
               
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              .title
            
            
              '
            
            
              )
            
            
              print
            
            
              (type(items))
            
            
              print
            
            
              (items)
p 
            
            = items.find(
            
              '
            
            
              b
            
            
              '
            
            
              )
            
            
              print
            
            
              (type(p))
            
            
              print
            
            (p)
          
        该代码为查找id为title的标签,我们可以看到id为title的标签有两个一个是p标签,一个是a标签,然后我们再使用find方法,查找出我们需要的p标签,运行结果:
这里需要注意的是,我们所使用的find是查找每一个元素内部的标签.
children:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
                
                  
                    The Dormouse's story
                  
                
                
                  Once upon a time there were three little sisters;and thier names were
                  
                                      
                  
                    Lacie
                  
                   and
                  
                    Title
                  
                  ; and they lived at the boottom of a well.
                
                
                  ...
                
               
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              .title
            
            
              '
            
            
              )
            
            
              print
            
            (items.children())
          
        运行结果:
也可以在children()内添加选择器条件:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
                
                  
                    The Dormouse's story
                  
                
                
                  Once upon a time there were three little sisters;and thier names were
                  
                                      
                  
                    Lacie
                  
                   and
                  
                    Title
                  
                  ; and they lived at the boottom of a well.
                
                
                  ...
                
               
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              .title
            
            
              '
            
            
              )
            
            
              print
            
            (items.children(
            
              '
            
            
              b
            
            
              '
            
            ))
          
        输出结果和上面的一致。
父元素:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
                
                  
                    The Dormouse's story
                  
                
                
                  Once upon a time there were three little sisters;and thier names were
                  
                                      
                  
                    Lacie
                  
                   and
                  
                    Title
                  
                  ; and they lived at the boottom of a well.
                
                
                  ...
                
               
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              #link1
            
            
              '
            
            
              )
            
            
              print
            
            
              (items)
            
            
              print
            
            (items.parent())
          
        运行结果:
这里只输出一个父元素。这里我们用parents方法会给予我们返回所有父元素,祖先元素
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               祖先元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              #link1
            
            
              '
            
            
              )
            
            
              print
            
            
              (items)
            
            
              print
            
            (items.parents(
            
              '
            
            
              body
            
            
              '
            
            ))
          
        运行结果:
兄弟元素:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               兄弟元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              #link1
            
            
              '
            
            
              )
            
            
              print
            
            
              (items)
            
            
              print
            
            (items.siblings(
            
              '
            
            
              #link2
            
            
              '
            
            ))
          
        运行结果:
上面就把查找元素的方法都说了,下面我来看一下如何遍历元素。
遍历
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               兄弟元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              a
            
            
              '
            
            
              )
            
            
              for
            
             k,v 
            
              in
            
            
               enumerate(items.items()):
    
            
            
              print
            
            (k,v)
          
        运行结果:
获取信息:
获取属性:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               获取属性
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              a
            
            
              '
            
            
              )
            
            
              print
            
            
              (items)
            
            
              print
            
            (items.attr(
            
              '
            
            
              href
            
            
              '
            
            
              ))
            
            
              print
            
            (items.attr.href)
          
        运行结果:
获得文本:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               获取属性
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              a
            
            
              '
            
            
              )
            
            
              print
            
            
              (items)
            
            
              print
            
            
              (items.text())
            
            
              print
            
            (type(items.text()))
          
        运行结果:
获得HTML:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               获取属性
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              a
            
            
              '
            
            
              )
            
            
              print
            
            (items.html())
          
        运行结果:
DOM操作:
addClass、removeClass
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,addClass、removeClass
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              #link2
            
            
              '
            
            
              )
            
            
              print
            
            
              (items)
items.addClass(
            
            
              '
            
            
              addStyle
            
            
              '
            
            ) 
            
              #
            
            
               add_class
            
            
              print
            
            
              (items)
items.remove_class(
            
            
              '
            
            
              sister
            
            
              '
            
            ) 
            
              #
            
            
               removeClass 
            
            
              print
            
            (items)
          
        运行结果:
attr、css:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,attr,css
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              #link2
            
            
              '
            
            
              )
items.attr(
            
            
              '
            
            
              name
            
            
              '
            
            ,
            
              '
            
            
              addname
            
            
              '
            
            
              )
            
            
              print
            
            
              (items)
items.css(
            
            
              '
            
            
              width
            
            
              '
            
            ,
            
              '
            
            
              100px
            
            
              '
            
            
              )
            
            
              print
            
            (items)
          
        可以给予新的属性,如果原来有该属性,会覆盖掉原有的属性
运行结果:
remove:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,remove
            
            
              
html 
            
            = 
            
              """
            
            
              
                
    Hello World
    
                
                  This is a paragraph.
                
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
wrap 
            
            = doc(
            
              '
            
            
              .wrap
            
            
              '
            
            
              )
            
            
              print
            
            
              (wrap.text())
wrap.find(
            
            
              '
            
            
              p
            
            
              '
            
            
              ).remove()
            
            
              print
            
            (
            
              "
            
            
              remove以后的数据
            
            
              "
            
            
              )
            
            
              print
            
            (wrap)
          
        运行结果:
还有很多其他的DOM方法,想了解更多的小伙伴可以阅读其官方文档,地址:https://pyquery.readthedocs.io/en/latest/api.html
伪类选择器:
            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,伪类选择器
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
               
              
                Once upo a time were three little sister;and theru name were
            
                
                  
                    Elsie
                  
                
                
                  Lacie
                
                
            and 
            
                
                  Title
                
                
                  Title
                
              
              
                ...
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
            
            
              #
            
            
               print(doc)
            
            
wrap = doc(
            
              '
            
            
              a:first-child
            
            
              '
            
            ) 
            
              #
            
            
               第一个标签
            
            
              print
            
            
              (wrap)
wrap 
            
            = doc(
            
              '
            
            
              a:last-child
            
            
              '
            
            )  
            
              #
            
            
               最后一个标签
            
            
              print
            
            
              (wrap)
wrap 
            
            = doc(
            
              '
            
            
              a:nth-child(2)
            
            
              '
            
            ) 
            
              #
            
            
               第二个标签
            
            
              print
            
            
              (wrap)
wrap 
            
            = doc(
            
              '
            
            
              a:gt(2)
            
            
              '
            
            ) 
            
              #
            
            
               比2大的索引 标签  即为  0 1 2 3 4 从0开始的  不是1
            
            
              print
            
            
              (wrap)
wrap 
            
            = doc(
            
              '
            
            
              a:nth-child(2n)
            
            
              '
            
            ) 
            
              #
            
            
               第 2的整数倍 个标签
            
            
              print
            
            
              (wrap)
wrap 
            
            = doc(
            
              '
            
            
              a:contains(Lacie)
            
            
              '
            
            ) 
            
              #
            
            
               包含Lacie文本的标签
            
            
              print
            
            (wrap)
          
        这里不在详细的一一列举了,了解更多CSS选择器可以查看官方文档,由W3C提供地址:http://www.w3school.com.cn/css/index.asp
到这里我们就把pyQuery的使用方法大致的说完了,想了解更多,更详细的可以阅读官方文档,地址:https://pyquery.readthedocs.io/en/latest/
上述代码地址:https://gitee.com/dwyui/pyQuery.git
感谢大家的阅读,不正确的地方,还希望大家来斧正,鞠躬,谢谢。


 
             
           
           
           
           
           
           
           
           
           
           
           
           
           
           
           
					 
					