Python下的Curl-yueming-ChinaUnix博客

疯狂Erlangyueming.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

yueming

博客访问： 5183049
博文数量： 921
博客积分： 16037
博客等级：上将
技术积分： 8469
用户组：普通用户
注册时间： 2006-04-05 02:08

文章分类

全部博文（921）

计算机网络（2）
git（2）
数据结构和算法（4）
Erlang（100）

mnesia（1）
云计算（5）
游戏开发（30）
C++/C（1）
Flex（2）

Flex框架（0）

mxml（0）

AS3（0）
UML（1）
数据库（54）

MongoDB（1）

NOSQL（4）

关系型(Mysql)（0）

redis（49）
python（266）

gevent（2）

Django（7）

Twisted（94）

wxpython（0）
WEB系统架构（6）
英文文档翻译（0）

Magento文档翻译（0）
PHP5（82）
jQuery（4）
zend framework（36）
AJAX（6）
js（19）
css+div（0）
web2.0技术（1）
Linux（52）
教学内容（4）
IT生活杂谈（12）

C/C++（4）
ksh&sh&csh（14）
WINDOWS（9）

AMP（9）

平面&三维设计（0）

网页三剑客&&html（0）

asp&&sqlserver（0）
netbsd&&openbsd（0）
gcc&&makefile（6）
FAMP（151）
FreeBSD（41）
未分配的博文（11）

文章存档

2020年（1）

2019年（3）

2018年（3）

2017年（6）

2016年（47）

2015年（72）

2014年（25）

2013年（72）

2012年（125）

2011年（182）

2010年（42）

2009年（14）

2008年（85）

2007年（89）

2006年（155）

我的朋友

相关博文

Python下的Curl

分类： Python/Ruby

2013-10-10 14:28:27

1。

   在找pycurl的使用方法时,对初次使用者,很困难,于是想写个简单的demo方便想涉足者使用:
import pycurl
import StringIO
    url=''
    c=pycurl.Curl()
    c.setopt(c.URL, url)
    b = StringIO.StringIO()
    c.setopt(c.WRITEFUNCTION, b.write)
    c.setopt(c.FOLLOWLOCATION, 1)
    c.setopt(c.HEADER, True)
    c.perform()
    html=b.getvalue()
    print html
    b.close()
    c.close()
=========================================================
def test(debug_type, debug_msg):
    print "debug(%d): %s" % (debug_type, debug_msg)

curl会用到的一些方法:
c.setopt(c.HTTPHEADER, ["Content-Type: application/x-www-form-urlencoded","X-Requested-With:XMLHttpRequest","Cookie:"+set_cookie[0]])
c.setopt(c.REFERER, url)
c.setopt(c.POSTFIELDS, params)
c.setopt(c.VERBOSE, 1)

c.setopt(c.POST, 1)
c.setopt(c.DEBUGFUNCTION, test)

   url = ""

    print "Starting downloading", url
    print
    f = open("body", "wb")
    h = open("header", "wb")
    c = pycurl.Curl()
    c.setopt(c.URL, url)
    c.setopt(c.WRITEDATA, f)
    c.setopt(c.NOPROGRESS, 0)
    c.setopt(c.PROGRESSFUNCTION, progress)
    c.setopt(c.FOLLOWLOCATION, 1)
    c.setopt(c.MAXREDIRS, 5)
    c.setopt(c.WRITEHEADER, h)
    c.setopt(c.POST, 1)
    c.setopt(c.OPT_FILETIME, 1)
    c.perform()

    print "HTTP-code:", c.getinfo(c.HTTP_CODE)
    print "Total-time:", c.getinfo(c.TOTAL_TIME)
    print "Download speed: %.2f bytes/second" % c.getinfo(c.SPEED_DOWNLOAD)
    print "Document size: %d bytes" % c.getinfo(c.SIZE_DOWNLOAD)
    print "Effective URL:", c.getinfo(c.EFFECTIVE_URL)
    print "Content-type:", c.getinfo(c.CONTENT_TYPE)
    print "Namelookup-time:", c.getinfo(c.NAMELOOKUP_TIME)
    print "Redirect-time:", c.getinfo(c.REDIRECT_TIME)
    print "Redirect-count:", c.getinfo(c.REDIRECT_COUNT)
    epoch = c.getinfo(c.INFO_FILETIME)
    #print "Filetime: %d (%s)" % (epoch, time.ctime(epoch))
    #print
    print "Header is in file 'header', body is in file 'body'"

    c.close()
    f.close()
    h.close()

    #print pycurl.version_info()
    url=''
    c=pycurl.Curl()
    c.setopt(pycurl.URL, url);

    b = StringIO.StringIO()
    c.setopt(pycurl.HTTPHEADER, ["Accept:"])

    c.setopt(pycurl.WRITEFUNCTION, b.write)

    c.setopt(pycurl.FOLLOWLOCATION, 2)
    #c.setopt(pycurl.HEADER, True)
    c.setopt(pycurl.MAXREDIRS, 5)
    #c.setopt(pycurl.USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)")
    #c.setopt(pycurl.REFERER, "")
    #c.setopt(pycurl.CONNECTTIMEOUT, 20)#链接超时
    #c.setopt(pycurl.TIMEOUT, 20)#下载超时
    #c.setopt(pycurl.COOKIEFILE, "cookie_file_name")
    #c.setopt(pycurl.COOKIEJAR, "cookie_file_name")
    c.perform()
    #print ret
    html=b.getvalue()
    print '-----------'
    print html
========================代理使用

defgetURLContent_pycurl(url):

c = pycurl.Curl()

c.setopt(pycurl.URL,url)

b = StringIO.StringIO()

c.setopt(pycurl.WRITEFUNCTION, b.write)

c.setopt(pycurl.FOLLOWLOCATION, 1)

c.setopt(pycurl.MAXREDIRS, 5)

#代理

#c.setopt(pycurl.PROXY, '')

#c.setopt(pycurl.PROXYUSERPWD, 'aaa:aaa')

c.perform()

returnb.getvalue()

url ='http://blog.csdn.net'

content = getURLContent_pycurl(url)

printcontent

2。

这阵子使用python里读rss保存到数据库里，但使用了一段时间urllib觉得慢，在网上说pycurl的速度比urllib快，于是尝试使用，记录下使用方法:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import StringIO
import pycurl
html = StringIO.StringIO()
c = pycurl.Curl()
myurl=''
c.setopt(pycurl.URL, myurl)
#写的回调
c.setopt(pycurl.WRITEFUNCTION, html.write)
c.setopt(pycurl.FOLLOWLOCATION, 1)
#最大重定向次数,可以预防重定向陷阱
c.setopt(pycurl.MAXREDIRS, 5)
#连接超时设置
c.setopt(pycurl.CONNECTTIMEOUT, 60)
c.setopt(pycurl.TIMEOUT, 300)
#模拟浏览器
c.setopt(pycurl.USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)")
#访问,阻塞到访问结束
c.perform()
#打印出 200(HTTP状态码)
print c.getinfo(pycurl.HTTP_CODE)
#输出网页的内容
print html.getvalue()
#输出网页类型
print "Content-type:", c.getinfo(c.CONTENT_TYPE)

安装pycurl到http://pycurl.sourceforge.net/这里去找.
在windows安装的话http://pycurl.sourceforge.net/download/ , 看你使用的版本决定下载那个，我在 windows使用的是python2.4, 所以下载 pycurl-ssl-7.15.5.1.win32-py2.4.exe 。

#-*- coding:utf-8 -*-
import os
import pycurl
import StringIO
html = StringIO.StringIO()
url = r''
c = pycurl.Curl()
c.setopt(pycurl.URL,url)
c.setopt(pycurl.SSL_VERIFYHOST, False)
c.setopt(pycurl.SSL_VERIFYPEER,False)
#c.setopt(pycurl.USERAGENT,r"User-Agent: Dalvik/1.4.0 (Linux; U; Android 2.3.7; Milestone Build/SHOLS_U2_05.26.3)")
c.setopt(pycurl.WRITEFUNCTION, html.write)
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.perform()
print c.getinfo(pycurl.HTTP_CODE), c.getinfo(pycurl.EFFECTIVE_URL)
print html.getvalue()

阅读(3147) | 评论(0) | 转发(0) |

上一篇：erlang 格式化输出

下一篇：pycurl的使用

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6