python利用beautiful爬取豆瓣网top250-chinaboywg-ChinaUnix博客

chinaboy小宝chinaboy007.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

chinaboywg

博客访问： 2922682
博文数量： 348
博客积分： 2907
博客等级：中校
技术积分： 2272
用户组：普通用户
注册时间： 2010-03-12 09:16

个人简介

专注 K8S研究

文章分类

全部博文（348）

elk（2）
docker（5）
error（0）
zabbix（21）
haproxy（2）
linux（11）
redis（2）
lvs（9）
squid（8）
nagios（4）
puppet（6）
html（1）
nginx（45）
apache（3）
mysql（65）
php（0）
python（114）

pycharm（1）

pip（1）

requests（1）

requests（0）

urllib（0）

logging（1）

flask（0）

lib（0）

pyqt4（14）

django（7）

beautifulsoup（11）

scrapy（3）

string（6）

pexpect（4）
shell（19）
linux（25）
other（4）
未分配的博文（2）

文章存档

2019年（22）

2018年（57）

2016年（2）

2015年（27）

2014年（33）

2013年（190）

2011年（3）

2010年（14）

我的朋友

相关博文

python利用beautiful爬取豆瓣网top250

分类： Python/Ruby

2013-07-07 19:35:10

#coding:utf-8

#import pyquery
import urllib2
import re
from bs4 import BeautifulSoup
"""
分析结构

                            盗梦空间
                                    / Inception
                                / 潜行凶间(港) / 全面启动(台)

                            导演: 克里斯托弗·诺兰 Christopher Nolan   主演: 莱昂纳多·迪卡普里奥 Le...

                            2010 / 美国英国 / 动作科幻悬疑冒险

                            9.2
                            451323人评价

诺兰给了我们一场无法盗取的梦。

"""
def crawl(url):
   page = urllib2.urlopen(url)
   contents = page.read()
   soup = BeautifulSoup(contents)
   print(u'               豆瓣电影TOP250:\n 序号 \t影片名\t 链接 ')
   for tag in soup.find_all('div', class_='item'):
      m_order=tag.em.get_text()
      #print m_order
      m_name=tag.span.get_text()
      #print m_name
      #m_rating_score=tag.find_all('div',class_="star").find(text=re.compile("span"))
      #m_rating_score=soup.find(text=re.compile("^0-9"))
      #print m_rating_score
      m_url=str(tag.find('a')).split('"')[1]
      #print m_url
      print ("%s %s %s" %(m_order, m_name,m_url))
if __name__ == '__main__':
    crawl('')

输出结果：
               豆瓣电影TOP250:
序号影片名   链接
1 肖申克的救赎
2 这个杀手不太冷
3 阿甘正传
4 霸王别姬
5 盗梦空间
6 海上钢琴师
7 美丽人生
8 三傻大闹宝莱坞
9 辛德勒的名单
10 放牛班的春天
11 龙猫
12 搏击俱乐部
13 泰坦尼克号
14 教父
15 天堂电影院
16 忠犬八公的故事
17 千与千寻
18 罗马假日
19 乱世佳人
20 大话西游之大圣娶亲
21 天使爱美丽
22 当幸福来敲门
23 楚门的世界
24 怦然心动
25 两杆大烟枪

阅读(3349) | 评论(0) | 转发(1) |

上一篇：Python去除String中的空格/换行/回车等

下一篇：Shell中的数组

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6