Chinaunix首页 | 论坛 | 博客
  • 博客访问: 6053163
  • 博文数量: 2759
  • 博客积分: 1021
  • 博客等级: 中士
  • 技术积分: 4091
  • 用 户 组: 普通用户
  • 注册时间: 2012-03-11 14:14
文章分类

全部博文(2759)

文章存档

2019年(1)

2017年(84)

2016年(196)

2015年(204)

2014年(636)

2013年(1176)

2012年(463)

分类:

2012-12-22 10:38:43

http://blog.b999.net/post/141/


#-*- coding: UTF-8 -*-
'''
Created on 2012-3-8

@author: tiantian

Modify: 2012-4-15
The correct save to file in windows
'''
import urllib
import re
import platform
import os

top500 = ''
#top500 = ''

songs = []

if (os.path.exists('songs')== False):
 os.mkdir('songs')

def main():

    divr = '
.*?.*?
'
    mf = urllib.urlopen(top500)
    content = mf.read()
    content = content.decode('gbk')

    content = re.sub('\n+',' ',content)
    alldiv = re.findall(divr,content)
    i =0
    for div in alldiv:
        ulr = ''
        allul = re.findall(ulr,div)

        for ul in allul:
            lir = ''
            allli = re.findall(lir,ul)

            for li in allli:
                if i<245:
                    i = i+1
                    continue
                i = i+1
                songName = '
.*?(.*?).*?
'
                name = re.findall(songName,li)
                songAuthor = '
.*?(.*?).*?
'
                author = re.findall(songAuthor,li)

                songs.append([name[0],author[0]])

                songUrl = getSongUrl(name[0],author[0])

                sysstr = platform.system()
                if(sysstr =="Windows"):
                 filename = ('songs/'+name[0]+'-'+author[0]+'.mp3').encode('gbk')
                elif(sysstr == "Linux"):
                 filename = 'songs/'+name[0]+'-'+author[0]+'.mp3'
                else:
                 print ("Other System tasks")
                print filename

                try:
                    urllib.urlretrieve(songUrl,filename)
                    # 异常检查并不能判断是否下载成功,需要进行其他判断
                    print i,name[0],author[0],'下载成功'

                except Exception :
                    print i,name[0],author[0],'没下载成功'


def getSongUrl(songName,authorName):
    '''这里由于歌曲名称和作者名称的不完整,可能导致无法得到url,'''
    songUrl = '%s$$%s$$$$&url=&listenreelect=0&.r=0.1696378872729838' % (urllib.quote(songName.encode('gbk')),urllib.quote(authorName.encode('gbk')))
    f = urllib.urlopen(songUrl)
    c = f.read()
    url1 = re.findall('.*?CDATA\[(.*?)\]].*?',c)
    url2 = re.findall('.*?CDATA\[(.*?)\]].*?',c)
    if len(url1) <1:
        return ''

    try:
        return url1[0][:url1[0].rindex('/')+1] + url2[0]
    except Exception:
        return url1[0]

if __name__ == '__main__':
    main()


采集的mp3文件保存在新建的目录 songs下


阅读(685) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~