无聊之人--除了技术,还是技术,你懂得
分类: Python/Ruby
2013-09-05 16:24:13
前面我们介绍了BeautfulSoup,Tag,name,attibutes,NavigableString,现在我们接着我们来详细的探究一下NavigatableString
现在我们来探究Navigating the tree,首先说一下他的本质是字符串,但是由于它本身可能还包含标签,因此,仅仅将其定义为字符串是不够的,现在我们就来看一下
本文使用的例子认为:
点击(此处)折叠或打开
">The Dormouse's story
">Once upon a time there were three little sisters; and their names were
">...
点击(此处)折叠或打开
The Dormouse's story
点击(此处)折叠或打开
soup = BeautifulSoup(html_doc)
bshead = soup.head
bscontents = bshead.contents
print '1'*20
print soup.head,type(soup.head)
print '2'*20
print bscontents,type(bscontents),len(bscontents)
print '3'*20
i= 0
for children in bscontents:
i=i+1
print i,children,type(children)
print '4'*20
bsheadchild = bshead.children
print bsheadchild,type(bsheadchild)
i= 0
for children in bsheadchild:
i=i+1
print i,children,type(children)
点击(此处)折叠或打开
点击(此处)折叠或打开
点击(此处)折叠或打开
点击(此处)折叠或打开
The Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
点击(此处)折叠或打开
点击(此处)折叠或打开
Tillie
;
and they lived at the bottom of a well.
...
下面我们讨论,我们比较常用的方法search the parse TREE
方法主要有:find,find_all
该方法接受的参数主要有: TAG,regular expression,A list,function,CSS selector,text
结果集的参数限制: limit =2
点击(此处)折叠或打开
The Dormouse's story
,Once upon a time there were three little sisters; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
]
Because find_all() is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object. These two lines of code are equivalent:
soup.find_all("a")
soup("a")