Chinaunix首页 | 论坛 | 博客
  • 博客访问: 205395
  • 博文数量: 48
  • 博客积分: 1935
  • 博客等级: 上尉
  • 技术积分: 491
  • 用 户 组: 普通用户
  • 注册时间: 2010-07-29 00:59
文章分类

全部博文(48)

文章存档

2011年(1)

2010年(47)

我的朋友

分类: Python/Ruby

2010-09-21 00:39:11

[python]抓取songtaste音乐
2010-03-06 19:10
#encoding=cp936
from urllib import urlencode,urlopen,urlretrieve
from urllib2 import build_opener,HTTPCookieProcessor
from cookielib import CookieJar
from re import findall

class crawler():
...def __init__(self):
......pass

...def post(self,url,headers,body):
......cj=CookieJar()
......self.opener=build_opener(HTTPCookieProcessor(cj))
......self.opener.addheaders=headers
......return self.opener.open(url,urlencode(body)).read()

...def getWebcode(self,url):
......self.data=self.opener.open(url).read()

...def getwebcode(self,url):
......self.data=urlopen(url).read()

...def fetchall(self,patt):
......return findall(patt,self.data)

...def webcode(self):
......return self.data

if __name__=="__main__":
...url=raw_input("url:")
...c=crawler()
...c.getwebcode(url)
...data=c.fetchall("""playmedia1\(([^<>;]+)\);""")[0]
...xxx=eval("["+data+"]")
...if xxx[5]=="7d99bb4c7bd4602c342e2bb826ee8777":
......type_str=".wma"
...elif xxx[5]=="25e4f07f5123910814d9b8f3958385ba":
......type_str=".Wma"
...elif xxx[5]=="51bbd020689d1ce1c845a484995c0cce":
......type_str=".WMA"
...elif xxx[5]=="b3a7a4e64bcd8aabe4cabe0e55b57af5":
......type_str=".mp3"
...elif xxx[5]=="d82029f73bcaf052be8930f6f4247184":
......type_str=".MP3"
...elif xxx[5]=="5fd91d90d9618feca4740ac1f2e7948f":
......type_str=".Mp3"
...url=xxx[6]+xxx[2]+type_str
...print url
...fn=unicode(c.fetchall("""([^<>]+)""")[0],"gb2312").encode("gb2312").split(" --")[0]
...urlretrieve(url,fn+type_str)
阅读(1046) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~