方法还有待改进,前期需手动操作,待后续更新。。。
1.从浏览器中获取要登录网站cookie
例如 chrome,步骤如下:
setting:
data:image/s3,"s3://crabby-images/140c2/140c25904d129ff8a59636a9acd5ef1d6669febd" alt=""
dvanced:
data:image/s3,"s3://crabby-images/c0ef1/c0ef10b112d7f6b4a857d73a397e373227dc3474" alt=""
contentSetting:
data:image/s3,"s3://crabby-images/98624/986249c3452431c93ee84441acd5e10ffe3d1a39" alt=""
all cookies and site data:
data:image/s3,"s3://crabby-images/fde7d/fde7d9abfcd6f223de9d58a0fd6c524a493296ab" alt=""
search sinalogin cookie and url:
2.通过 chrome 的 Developer Tools (key F12)调试登陆成功的微博主页,然后用里面的访问记录,找到会话期间的cookie
open your sina homepage press 'F12' get into the debug model
(1)在地址栏获取url
(2)然后像图中一样获取session期间的cookie
3.通过python的urllib2结合刚刚找到的cookie访问你的微博主页面
-
import urllib2
-
import urllib
-
import sys
-
import re
-
-
-
url = 'your url'
-
headers = {'cookie':'your cookie'}
-
req = urllib2.Request(url, headers=headers)
-
r = urllib2.urlopen(req)
-
htmlcont = r.read()
-
print htmlcont
-
f = open('htmlcode', 'w')
-
f.write(htmlcont)
-
f.close()
通过上面的步骤获得了登陆成功页面的HTML代码,可通过浏览器打开,也可通过像selenium一样的工具包进行动态网页解析,提取网页内容。
阅读(2027) | 评论(0) | 转发(0) |