python 爬虫-jackywgw-ChinaUnix博客

Chinaunix首页 | 论坛 | 博客

jackywgw的ChinaUnix博客

首页　| 　博文目录　| 　关于我

博客访问： 571599
博文数量： 142
博客积分： 0
博客等级：民兵
技术积分： 1452
用户组：普通用户
注册时间： 2013-09-12 16:28

文章分类

全部博文（142）

lex&yacc（1）
工程管理工具（5）

git（3）

cvs（2）
数据结构与算法（2）
linux（46）

文件（9）

kernel编程（1）

进程间通信（1）

shell（11）

iptables（3）

进程（7）

信号（10）

centos（2）
linux 内存（3）
python（1）
nginx（6）
GDB（2）
同步（22）
网络（33）

dns（4）

tcp（4）

qos（0）

udp（4）

ipv6（5）
c/c++（8）

c函数（1）
未分配的博文（13）

文章存档

2016年（10）

2015年（60）

2014年（72）

我的朋友

最近访客

推荐博文

相关博文

python 爬虫

分类： Python/Ruby

2015-09-14 19:00:36

python 爬虫

点击(此处)折叠或打开

import re
import urllib
def getHtml(url) :
page = urllib.urlopen(url);
html = page.read();
return html
def getImg(html) :
reg = r'id="p-ad" .*'
imgre = re.compile(reg)
imglist = re.findall(imgre,html)
return imglist
print "begin ..."
html = getHtml("http:// ")
#html = getHtml(" style="color:#0000CC;">)
print getImg(html)

阅读(758) | 评论(0) | 转发(0) |

0

上一篇：grep 排除文件夹和文件递归输出行号查询something

下一篇：linux 进程调度策略

给主人留下些什么吧！~~

关于我们 | 关于IT168 | 联系方式 | 广告合作 | 法律声明 | 免费注册

Copyright 2001-2010 ChinaUnix.net All Rights Reserved 北京皓辰网域网络信息技术有限公司. 版权所有

感谢所有关心和支持过ChinaUnix的朋友们