Python百度空间备份改进版1-yexin218-ChinaUnix博客

※一路风尘※leyond.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

yexin218

博客访问： 6641611
博文数量： 227
博客积分： 10047
博客等级：上将
技术积分： 6678
用户组：普通用户
注册时间： 2006-07-11 10:33

个人简介

网上的蜘蛛

文章分类

全部博文（227）

存在与活（72）
网页开发（5）
时事新闻（0）
思科网络（0）
Java学习（22）

Eclipse（2）
音乐宝盒（2）
英语沙龙（3）

英语写作（0）
新闻娱乐（0）
图片欣赏（2）
网络美文（0）
数据库（0）

LDAP（0）
资料常识（1）
操作系统（8）

Linux（3）
C/C++（15）

驱动开发（5）

linuxWeb（0）
JSP日志（1）

图表编程（0）
编程开发（96）

Python（7）

Ruby（1）

AS3（2）

NS2（31）

Flex（35）

XML（3）

VirtualWiFi（5）

JQuery（2）

struts（0）

LifeRay（0）

Ajax（4）

Delphi（0）

OpenGL（6）
未分配的博文（0）

文章存档

2010年（19）

2009年（29）

2008年（179）

我的朋友

Python备份百度博客在此功能上做些代码优化，性能还有待...

''' Created on Apr 23, 2010 @author: Leyond ''' import urllib from BeautifulSoup import BeautifulSoup import re def saveToFile(dir, htmlContent, title,url=""): nFail = 0 dir +="/%s" % (url) #print dir while nFail < 1: try: myfile = open(dir, 'w') myfile.write(""+str(title)+""+str(htmlContent)+" ") myfile.close() return except: nFail += 1 print "%s download Fail." % (title) def findNextBlogHtml(user,htmlContent): urls = re.findall(r"var.*pre.*?/blog/item/.*?html",htmlContent,re.I) if(len(urls)==1): blogUrl = re.findall(r"/blog/item/\w*.html",urls[0],re.I) print blogUrl[0] if(len(blogUrl[0])>17): htmlAddr = blogUrl[0][11:] #print htmlAddr else: htmlAddr ="None" else: htmlAddr ="None" return htmlAddr def getBlogContentAndTitle(user,htmlUrl): blogUrl="" + user+"/blog/item/"+htmlUrl sock = urllib.urlopen(blogUrl) blogHtmlContent = sock.read() sock.close() htmlContent = unicode(blogHtmlContent,'gb2312','ignore').encode('utf-8','ignore') # parser the html content htmlsoup = BeautifulSoup(htmlContent) blogContentBlock = htmlsoup.findAll("div",{"id":"m_blog"}) blogContentBlockZero = blogContentBlock[0].findAll("table",{"style":"table-layout:fixed;width:100%"}) #get the title blogTitleZero = blogContentBlock[0].findAll("div",{"class":"tit"}) blogTitle = blogTitleZero[0].string #get blog publish date blogPublishDate = blogContentBlock[0].findAll("div",{"class":"date"}) blogDate = blogPublishDate[0].string blogData =str(""+blogDate+"") + str(blogContentBlockZero[0]) return blogData,blogTitle,htmlContent def backUpBlog(user,firstBlogUrl ): #first read first blog

使用方法跟第一篇相同：用之前，需要在文件所在目录新建一个目录，例如我的博客就是 codedeveloper,使用这段程序，需要更改两个参数：

其中user那里指的是你的用户名，firstBlogUrl说的是你最新那篇博文的地址~

有个问题：如何支持中文目录呢？

阅读(4275) | 评论(0) | 转发(0) |

上一篇：Python正则表达式的几种匹配用法

下一篇：博客备份界面版

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6

Python备 份百度博客 在此功能上做些代码优化，性能还有待...

Python备份百度博客在此功能上做些代码优化，性能还有待...