Chinaunix首页 | 论坛 | 博客
  • 博客访问: 8349531
  • 博文数量: 1413
  • 博客积分: 11128
  • 博客等级: 上将
  • 技术积分: 14685
  • 用 户 组: 普通用户
  • 注册时间: 2006-03-13 10:03
个人简介

follow my heart...

文章分类

全部博文(1413)

文章存档

2013年(1)

2012年(5)

2011年(45)

2010年(176)

2009年(148)

2008年(190)

2007年(293)

2006年(555)

分类: Python/Ruby

2009-04-01 19:10:17

这段时间看baidu空间速度挺快,而且服务上面还可以,就想将自己的备用博客迁到百度上面。但是,百度之前的搬家公司现在已经没有了,我是程序员我怕谁,所以,便想自己通过html post的方法来搞定他吧。用python写脚本,先实现自动发文章部分,但是在登录的时候就出问题了。先看代码:

#!/usr/bin/python
import urllib,urllib2,cookielib
import pdb
def test():
        cj = cookielib.CookieJar()
        #pdb.set_trace()
        url_login = ""
        #body = (('username','xxx'),('password','xxx'),("verifycode","YRAE"))
        body = (('username','xxx'),('password','xxx'))
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        #opener.addheaders = [('User-agent','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)')]
        opener.addheaders = [('User-agent', 'Opera/9.23')]
        urllib2.install_opener(opener)
        req = urllib2.Request(url_login,urllib.urlencode(body))
        u = urllib2.urlopen(req)
        print u.read()

test()


最后抓了一下结果,看到,说是“请输入验证码",我在web上面并没有看到这样的提示(firefox下),看html源码发现的确是需要的。因为验证码是图片,所以最终没有再有耐心搞定去了。网上提供了两种办法,一种是进行图像分析处理,另一种是半自动方法:先将图片打开,然后用户自己再输入,不过这样一来就失去了最初的意义。
下面是找的一段在百度空间发文章的程序,很实用,但是依旧在登录验证码上卡住了。

#!/usr/bin/env python
#coding:utf8
import urllib,urllib2,httplib,cookielib,os,re,sys,time,md5
import pdb
def login(userdata,posturl):
        cookie=cookielib.CookieJar()
        cj=urllib2.HTTPCookieProcessor(cookie)
        request=urllib2.Request(posturl)
        opener=urllib2.build_opener(cj)
        c = opener.open(request,urllib.urlencode(userdata))
        bincontent= c.read()
        print bincontent
        return opener

def postdata (opener,data,posturl):
        loginrequest=urllib2.Request(posturl)
        print data
        c = opener.open(loginrequest,urllib.urlencode(data))
        bincontent = c.read()
        #print bincontent
def postbaidu (data):
        userdata = {'password':'iloveyou365','username':'riverbird2005','tpl':'sp','tpl_reg':'sp','u':'','Submit':' 登录 '}
        opener = login(userdata,'')
        #baidudata = {'cm':1,'ct':1,'spBlogCatName':unicode('分类',"utf-8").encode("gb2312"),'spBlogPower':0,'spBlogText':unicode(data['content'],"utf-8").encode("gb2312"),'spBlogTitle':unicode(data['title'],"utf-8").encode("gb2312"),'spIsCmtAllow':1,'spRefURL':'riverbird/creat/blog/','spVcode':'','spVerifyKey':'','tj':''}
        baidudata = {'cm':1,
                     'ct':1,
                     'spBlogCatName':unicode('分类',"utf-8").encode("gb2312"),
                     'spBlogPower':0,
                     'spBlogText':unicode(data['content'],"utf-8").encode("gb2312"),
                     'spBlogTitle':unicode(data['title'],"utf-8").encode("gb2312"),
                     'spIsCmtAllow':1,
                     'spRefURL':'riverbird/creat/blog/',
                     'spVcode':'',
                     'spVerifyKey':'',
                     'tj':''}
        postdata(opener,baidudata,'riverbird/commit')

"""
def postsina (data):
        userdata = {'loginname':'用户','passwd':'密码'}
        opener = login(userdata,'http://my.blog.sina.com.cn/login.php?index=index&type=new')
        sinadata = {'album':'','blog_body':data['content'],'blog_class':2,'blog_id':'','blog_title':data['title'],'is2bbs':1,'is_album':0,'is_media':'','join_circle':1,'sina_sort_id':'134','stag':'','tag':data['tag'],'time':'','x_cms_flag':1}
        postdata(opener,sinadata,'http://control.blog.sina.com.cn/admin/article/article_post.php')

def postsohu (data):
        m = md5.md5('密码')
        userdata = {'appid':'1019','b':'1','password':m.hexdigest(),'persistentcookie':0,'pwdtype':'1','s':'1213527861109','userid':'用户名@sohu.com','w':'1280'}
        opener = login(userdata,'')
        postpage=urllib2.Request('http://blog.sohu.com/manage/entry.do?m=add&t=shortcut')#
        c = opener.open(postpage)
        bincontent= c.read()
        p=re.compile(r'''\s+" name="
aid" value="(.*)">\s+''',re.M)#
        c = p.findall(bincontent)
        print c
        if len(c)>0:
                sohudata = {'aid':c[0],'allowComment':2,'categoryId':0,'contrCataId':'','contrChId':'','entrycontent':unicode(data['content'],"
utf-8").encode("gbk"),'entrytitle':unicode(data['title'],"utf-8").encode("gbk"),'excerpt':'','keywords':unicode(data['tag'],"utf-8").encode("gbk"),'m':'save','newGategory':'','oper':'art_ok','perm':'0','save':'-','shortcutFlag':'true'}
                postdata(opener,sohudata,'http://blog.sohu.com/manage/entry.do')
"
""

def main ():
        pdata = {'title':'title','content':'content','tag':"tag1 tag2"}
        #postsohu(pdata)
        #postsina(pdata)
        postbaidu(pdata)

if __name__ == '__main__':
        main()

阅读(2584) | 评论(3) | 转发(0) |
给主人留下些什么吧!~~

sharenyuwuxing2009-04-10 12:07:30

不错不错学习一下

shujuhuifu8022009-04-09 10:02:19

百度要考虑时间间隔的,频繁发就会需要输入验证码了。把时间设大一点即可。

chinaunix网友2009-04-06 12:36:28

先在浏览器正常登录,然后用程序发跟浏览器一样的headers,呵呵