2011-4-26 磁针石
#承接软件自动化实施与培训等gtalk: ouyangchongwu#gmail.com qq
37391319 博客:oychw.cublog.cn
#版权所有,转载刊登请来函联系
#python qq group: 深圳自动化测试python群:113938272
#武冈深圳qq群:66250781
#参考资料:
csdn的网站在python中使用urllib来打开会返回403,用mechanize可以方便地访问:
代码如下:
# -*- coding: gbk -*-
import mechanize
import cookielib
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
#br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# Want debugging messages?
#br.set_debug_http(True)
#br.set_debug_redirects(True)
#br.set_debug_responses(True)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/4.0.0')]
br.open("http://blog.csdn.net/oychw/archive/2011/04/22/6342146.aspx")
print br.response().read().decode("utf-8")
阅读(12481) | 评论(0) | 转发(0) |