Chinaunix首页 | 论坛 | 博客
  • 博客访问: 64410
  • 博文数量: 15
  • 博客积分: 1410
  • 博客等级: 上尉
  • 技术积分: 220
  • 用 户 组: 普通用户
  • 注册时间: 2007-06-22 09:09
文章分类

全部博文(15)

文章存档

2009年(15)

我的朋友
最近访客

分类: Python/Ruby

2009-04-17 16:43:07

python 得到指定网页中的图片,使用urllib
 
#!/usr/lib/python
# getimg.py
import sys,os
from sgmllib import SGMLParser
class URLLister(SGMLParser):
    def reset(self):
        SGMLParser.reset(self)
        self.urls = []
    def start_img(self, attrs):
        src = [v for k, v in attrs if k=='src']
        if src:
            self.urls.extend(src)

imgdir = "/home/jim/pic/"
def ImgDownload(inputurl, img):
    # judge whether the img have 'http://' or 'https://'
    ret = inputurl.find('http', 0, len(img))
    if 'http' not in img:
        imgurl = inputurl+img
    else:
        imgurl = img
    imgname = imgurl.split('/')[-1]
    imgpath = imgdir+imgname   
    try:
        if os.path.exists(imgpath):
            print imgpath+" have exist, Needn't to download"
        else:
            urllib.urlretrieve(imgurl, imgpath)
            print imgname+" has save to: "+imgpath
    except:
        print "Picture("+imgname+") which come from "+inputurl+" saved failed"

if __name__ == "__main__":
    import urllib
    while True:
        inputurl = raw_input("\nInput URL: ")
        if cmp(inputurl, 'quit') == 0:
            break
        ret = inputurl.find('http', 0, len(inputurl))
        if ret == -1:
            inputurl = "
        usock = urllib.urlopen(inputurl)
        parser = URLLister()
        parser.feed(usock.read())
        usock.close()
        parser.close()
        if not parser.urls:
            print "This page has not picture"
        else:
            for img in parser.urls:
                ImgDownload(inputurl, img)
阅读(642) | 评论(1) | 转发(0) |
0

上一篇:没有了

下一篇:C++ 工厂模式

给主人留下些什么吧!~~

chinaunix网友2009-04-23 15:22:53

非常好,谢谢指教!