Chinaunix首页 | 论坛 | 博客
  • 博客访问: 206321
  • 博文数量: 48
  • 博客积分: 1935
  • 博客等级: 上尉
  • 技术积分: 491
  • 用 户 组: 普通用户
  • 注册时间: 2010-07-29 00:59
文章分类

全部博文(48)

文章存档

2011年(1)

2010年(47)

我的朋友

分类: Python/Ruby

2010-09-21 01:10:45

自动提交和抓取网页python

时间:2009-08-06 23:38来源:未知 作者:yzhxiang 点击:
import urllib import urllib2 import urlparse import lxml.html def url_with_query(url, values): parts = urlparse.urlparse(url) rest, (query, frag) = parts[:-2], parts[-2:] return urlparse.urlunparse(rest + (urllib.urlencode(values), None)) d

import urllib
import urllib2
import urlparse
import lxml.html
def url_with_query(url, values):
parts = urlparse.urlparse(url)
rest, (query, frag) = parts[:-2], parts[-2:]
return urlparse.urlunparse(rest + (urllib.urlencode(values), None))
def make_open_http():
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
opener.addheaders = [] # pretend we're a human -- don't do this
def open_http(method, url, values={}):
if method == "POST":
return opener.open(url, urllib.urlencode(values))
else:
return opener.open(url_with_query(url, values))
return open_http
open_http = make_open_http()
tree = lxml.html.fromstring(open_http("GET", "").read())
form = tree.forms[0]
form.fields["q"] = "eplussoft"
form.action="/search"
response = lxml.html.submit_form(form,open_http=open_http)
html = response.read()
doc = lxml.html.fromstring(html)
lxml.html.open_in_browser(doc)

阅读(1475) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~