Chinaunix首页 | 论坛 | 博客
  • 博客访问: 5351840
  • 博文数量: 1144
  • 博客积分: 11974
  • 博客等级: 上将
  • 技术积分: 12312
  • 用 户 组: 普通用户
  • 注册时间: 2005-04-13 20:06
文章存档

2017年(2)

2016年(14)

2015年(10)

2014年(28)

2013年(23)

2012年(29)

2011年(53)

2010年(86)

2009年(83)

2008年(43)

2007年(153)

2006年(575)

2005年(45)

分类: LINUX

2006-10-08 06:48:50

curl是一個非常好用的下載與上傳的工具,如果有一個一連串的下載與上傳的任務要做,這時後我們就可以使用curl幫助我們.

下面都是一些我在網路上看到不錯的文章(一些使用上的例子),貼在這裡,互相交流學習.

========================================
curl的用法

一、基本抓檔:

% curl -O "http://blueapple.infor.org/curl/1.txt"

敲入這行指令以後,會出現代表抓檔進度以及速度的文字。

一定有人有過這種經驗:網頁上有許多按照編號命名的檔案,每次都要一個一個點選以後再儲存,實在很煩!要是只有十個二十個那還好,可是要是遇上了好幾百個、好幾千個檔案呢?curl提供了連續抓檔案的功能:

% curl -O "http://blueapple.infor.org/curl/mac/[1-10].jpg"

三、續傳檔案:

有的時候抓檔抓到一半就斷線是不是很令人洩氣呢?沒關係,curl支援續傳,若是有一個傳到一
的檔案叫做brokenfile,只要加一個參數 -c (小寫英文字母c):

% curl -c -o "brokenfile" "ftp://ftp.server.com/path/file"

四、分割下載:

PC上有個著名的軟體叫做FlashGet,可以把一個檔案分割成很多份同時下載。這有什麼好處呢?
些伺服器會限制每個人抓檔案的速度,而把一個檔案分割成很多份同時下載就好像有很多個人幫您
一樣,甲抓一部份,乙抓一部份,丙抓一部份。這樣子就可以讓您下載的速度增加。

% curl -r 0-40960 -o "rose.part1" ""
& \
curl -r 40961-81920 -o "rose.part2"
"" & \
curl -r 81921-125068 -o "rose.part3"
"" &

要把抓下來的各個部份結合起來,只要用以下的指令:

% cat rose.part* > rose.jpg

五、查字典:

碰上了沒遇過的單字?手頭上沒字典?curl支援DICT通訊協定,您也可以用curl來查字典!

% curl "dict://dict.org/d:apple"

六、上傳檔案:

% curl -T "files" -u user:Password "ftp://ftp.server.com/path/filename"

curl除了下載以外,也可以上傳檔案!參數 -T 後面接想要上傳的檔案,參數 -u 後面接使用者名稱和密碼(兩者以冒號分隔),就可以上傳檔案嘍!如果伺服器支援,也可以接 -c 參數來繼續上傳之前傳到一半的檔案。

1)
二话不说,先从这里开始吧! curl [url][/url] 回车之后,[url][/url] 的html就稀里哗啦地显示在屏幕上了~~~~~

2)
嗯,要想把读过来页面存下来,是不是要这样呢? curl [url][/url] > page.html 当然可以,但不用这么麻烦的! 用curl的内置option就好,存下http的结果,用这个option: -o curl -o page.html [url][/url] 这样,你就可以看到屏幕上出现一个下载页面进度指示。等进展到100%,自然就OK咯

3)
什么什么?!访问不到?肯定是你的proxy没有设定了。 使用curl的时候,用这个option可以指定http访问所使用的proxy服务器及其端口: -x curl -x 123.45.67.89:1080 -o page.html [url][/url]

4)
访问有些网站的时候比较讨厌,他使用cookie来记录session信息。 像IE/NN这样的浏览器,当然可以轻易处理cookie信息,但我们的curl呢?….. 我们来学习这个option: -D <-- 这个是把http的response里面的cookie信息存到一个特别的文件中去 curl -x 123.45.67.89:1080 -o page.html -D cookie0001.txt [url]这样,当页面被存到page.html的同时,cookie信息也被存到了cookie0001.txt里面了[/url]

5)
那么,下一次访问的时候,如何继续使用上次留下的cookie信息呢?要知道,很多网站都是靠监视你的cookie信息, 来判断你是不是不按规矩访问他们的网站的。 这次我们使用这个option来把上次的cookie信息追加到http request里面去: -b curl -x 123.45.67.89:1080 -o page1.html -D cookie0002.txt -b cookie0001.txt [url][/url] 这样,我们就可以几乎模拟所有的IE操作,去访问网页了!

6)
稍微等等~~~~~我好像忘记什么了~~~~~ 对了!是浏览器信息~~~~ 有些讨厌的网站总要我们使用某些特定的浏览器去访问他们,有时候更过分的是,还要使用某些特定的版本~~~~ NND,哪里有时间为了它去找这些怪异的浏览器呢!? 好在curl给我们提供了一个有用的option,可以让我们随意指定自己这次访问所宣称的自己的浏览器信息: -A curl -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -x 123.45.67.89:1080 -o page.html -D cookie0001.txt 这样,服务器端接到访问的要求,会认为你是一个运行在Windows2000上的IE6.0,嘿嘿嘿,其实也许你用的是苹果机呢!而"Mozilla/4.73 [en] (X11; U; Linux 2.2; 15 i686"则可以告诉对方你是一台PC上跑着的Linux,用的是Netscape 4.73,呵呵呵

7)
另外一个服务器端常用的限制方法,就是检查http访问的referer。比如你先访问首页,再访问里面所指定的下载页,这第二次访问的referer地址就是第一次访 问成功后的页面地址。这样,服务器端只要发现对下载页面某次访问的referer地址不 是首页的地址,就可以断定那是个盗连了~~~~~讨厌讨厌~~~我就是要盗连~~~~~!! 幸好curl给我们提供了设定referer的option: -e curl -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -x 123.45.67.89:1080 -e "mail.yahoo.com" -o page.html -D cookie0001.txt [url]这样,就可以骗对方的服务器,你是从mail.yahoo.com点击某个链接过来的了,呵呵呵[/url]

8)
写着写着发现漏掉什么重要的东西了!----- 利用curl 下载文件 刚才讲过了,下载页面到一个文件里,可以使用 -o ,下载文件也是一样。 比如, curl -o 1.jpg [url]~zzh/screen1.JPG[/url] 这里教大家一个新的option: -O 大写的O,这么用: curl -O [url]~zzh/screen1.JPG[/url] 这样,就可以按照服务器上的文件名,自动存在本地了! 再来一个更好用的。如果screen1.JPG以外还有screen2.JPG、screen3.JPG、....、screen10.JPG需要下载,难不成还要让我们写一个script来完成这些操作? 不干!在curl里面,这么写就可以了: curl -O [url]~zzh/screen[/url][1-10].JPG 呵呵呵,厉害吧?!~~~

9)
再来,我们继续讲解下载! curl -O ~{zzh,nick}/[001-201].JPG 这样产生的下载,就是 ~zzh/001.JPG ~zzh/002.JPG ... ~zzh/201.JPG ~nick/001.JPG ~nick/002.JPG ... ~nick/201.JPG 够方便的了吧?哈哈哈 咦?高兴得太早了。 由于zzh/nick下的文件名都是001,002...,201,下载下来的文件重名,后面的把前面的文件都给覆盖掉了~~~ 没关系,我们还有更狠的! curl -o #2_#1.jpg ~{zzh,nick}/[001-201].JPG --这是.....自定义文件名的下载? --对头,呵呵! #1是变量,指的是{zzh,nick}这部分,第一次取值zzh,第二次取值nick #2代表的变量,则是第二段可变部分---[001-201],取值从001逐一加到201 这样,自定义出来下载下来的文件名,就变成了这样: 原来: ~zzh/001.JPG ---> 下载后: 001-zzh.JPG 原来: ~nick/001.JPG —> 下载后: 001-nick.JPG 这样一来,就不怕文件重名啦,呵呵

9)
继续讲下载 我们平时在windows平台上,flashget这样的工具可以帮我们分块并行下载,还可以断线续传。curl在这些方面也不输给谁,嘿嘿比如我们下载screen1.JPG中,突然掉线了,我们就可以这样开始续传curl -c -O [url]~zzh/screen1.JPG[/url] 当然,你不要拿个flashget下载了一半的文件来糊弄我~~~~别的下载软件的半截文件可不一定能用哦~~~分块下载,我们使用这个option就可以了: -r举例说明 比如我们有一个[url]~zzh/zhao1.mp3[/url] 要下载(赵老师的电话朗诵 :D )我们就可以用这样的命令: curl -r 0-10240 -o “zhao.part1″ http:/cgi2.tky.3web.ne.jp/~zzh/zhao1.mp3 &\ curl -r 10241-20480 -o “zhao.part1″ http:/cgi2.tky.3web.ne.jp/~zzh/zhao1.mp3 &\ curl -r 20481-40960 -o “zhao.part1″ http:/cgi2.tky.3web.ne.jp/~zzh/zhao1.mp3 &\ curl -r 40961- -o “zhao.part1″ http:/cgi2.tky.3web.ne.jp/~zzh/zhao1.mp3 这样就可以分块下载啦。 不过你需要自己把这些破碎的文件合并起来 如果你用UNIX或苹果,用 cat zhao.part* > zhao.mp3就可以 如果用的是Windows,用copy /b 来解决吧,呵呵 上面讲的都是http协议的下载,其实ftp也一样可以用。 用法嘛,curl -u name:passwd [url]ftp://ip:port/path/file[/url] 或者大家熟悉的curl [url]ftp://name:passwd@ip:port/path/file[/url]

10)
说完了下载,接下来自然该讲上传咯 上传的option是 -T 比如我们向ftp传一个文件: curl -T localfile -u name:passwd [url]ftp://upload_site:port/path/[/url] 当然,向http服务器上传文件也可以 比如 curl -T localfile [url]~zzh/abc.cgi[/url] 注意,这时候,使用的协议是HTTP的PUT method 刚才说到PUT,嘿嘿,自然让老服想起来了其他几种methos还没讲呢! GET和POST都不能忘哦。 http提交一个表单,比较常用的是POST模式和GET模式 GET模式什么option都不用,只需要把变量写在url里面就可以了 比如:curl [url]/login.cgi?user=nickwolfe&password=12345[/url] 而POST模式的option则是 -d 比如,curl -d “user=nickwolfe&password=12345″ [url]/login.cgi[/url] 就相当于向这个站点发出一次登陆申请~~~~~ 到底该用GET模式还是POST模式,要看对面服务器的程序设定。 一点需要注意的是,POST模式下的文件上的文件上传,比如
这样一个HTTP表单,我们要用curl进行模拟,就该是这样的语法: curl -F upload=@localfile -F nick=go [url]~zzh/up_file.cgi[/url] 罗罗嗦嗦讲了这么多,其实curl还有很多很多技巧和用法 比如 https的时候使用本地证书,就可以这样 curl -E localcert.pem [url][/url] 再比如,你还可以用curl通过dict协议去查字典~~~~~ curl dict://dict.org/d:computer
]
 
curl的又一应用
对于网通计费的分析:

curl -d 'userid=YourName&password=YourPasswd&yearval=2006&monthval=07&Submit= 确定 '

这样就可以获得了。

当然,因为服务器端是JSP所以可以在URL里面进行处理。

?userid=YourName&password=YourPasswd&yearval=2006&monthval=07&Submit=确定

这样也可以获得你的页面。
 
curl学习
校园网ip网费查询脚本的分析
这个脚本原来是bbs上的一个帖子,我拿出来作为学习curl的资料。

设置cookie,如下:

[jasonh@fbsd bin]$ ./curl.sh
所剩余额: 109.00

[jasonh@fbsd bin]$ cat curl.sh
curl -d 'fr=00&id_ip=YOUR_IP&pass=YOUR_PASSWD&set=%BD%F8%C8%EB' \
        -L -D hitsun.cookie \
        -b hitsun.cookie 2>&1 |
sed -n 's/^.*\(所剩余额\)\([^0-9]*\)\([0-9.]*\).*/\1: \3/p'

命令分析:

curl

 -d/--data    HTTP POST data (H)
    --data-ascii    HTTP POST ASCII data (H)
    --data-binary   HTTP POST binary data (H)
    --negotiate     Enable HTTP Negotiate Authentication (H)
    --digest        Enable HTTP Digest Authentication (H)
    --disable-eprt  Prevent curl from using EPRT or LPRT (F)
    --disable-epsv  Prevent curl from using EPSV (F)

 -L/--location      Follow Location: hints (H)
    --location-trusted Follow Location: and send authentication even
                    to other hostnames (H)

 -D/--dump-header Write the headers to this file
    --egd-file EGD socket path for random data (SSL)
    --tcp-nodelay   Set the TCP_NODELAY option

 -b/--cookie Cookie string or file to read cookies from (H)
    --basic         Enable HTTP Basic Authentication (H)



今天花了好长时间搞这个cookie,总算是明白了一些。
curl -d 'fr=00&id_ip=YOUR_IP&pass=YOUR_PASSWD&set=%BD%F8%C8%EB' \
        -L -c hitsun.cookie

再用cookie进行登录的时候,出现了一些问题。
cookie -b hitsun.cookie
结果不对,返回的不是收费的页面,还是登录页面,即使加上-L参数还是不行。

最后,cliff出马,搞定。“姜还是老的辣”!

那个收费页面跟登录页面不是一个,名字是profile.php,通过手工登录就可以看出来。

cookie -b hitsun.cookie

就可以正常的返回收费页面了。
 
 
curl的Manual
Online:  http://curl.haxx.se/docs/httpscripting.html
Date: December 9, 2004
 
The Art Of Scripting HTTP Requests Using Curl
=============================================
 
This document will assume that you're familiar with HTML and general
networking.
 
The possibility to write scripts is essential to make a good computer
system. Unix' capability to be extended by shell scripts and various tools to
run various automated commands and scripts is one reason why it has succeeded
so well.
 
The increasing amount of applications moving to the web has made "HTTP
Scripting" more frequently requested and wanted. To be able to automatically
extract information from the web, to fake users, to post or upload data to
web servers are all important tasks today.
 
Curl is a command line tool for doing all sorts of URL manipulations and
transfers, but this particular document will focus on how to use it when
doing HTTP requests for fun and profit. I'll assume that you know how to
invoke 'curl --help' or 'curl --manual' to get basic information about it.
 
Curl is not written to do everything for you. It makes the requests, it gets
the data, it sends data and it retrieves the information. You probably need
to glue everything together using some kind of script language or repeated
manual invokes.
 
1. The HTTP Protocol
 
HTTP is the protocol used to fetch data from web servers. It is a very simple
protocol that is built upon TCP/IP. The protocol also allows information to
get sent to the server from the client using a few different methods, as will
be shown here.
 
HTTP is plain ASCII text lines being sent by the client to a server to
request a particular action, and then the server replies a few text lines
before the actual requested content is sent to the client.
 
Using curl's option -v will display what kind of commands curl sends to the
server, as well as a few other informational texts. -v is the single most
useful option when it comes to debug or even understand the curl<->server
interaction.
 
2. URL
 
The Uniform Resource Locator format is how you specify the address of a
particular resource on the Internet. You know these, you've seen URLs like
http://curl.haxx.se or a million times.
 
3. GET a page
 
The simplest and most common request/operation made using HTTP is to get a
URL. The URL could itself refer to a web page, an image or a file. The client
issues a GET request to the server and receives the document it asked for.
If you issue the command line
 
curl http://curl.haxx.se
 
you get a web page returned in your terminal window. The entire HTML document
that that URL holds.
 
All HTTP replies contain a set of headers that are normally hidden, use
curl's -i option to display them as well as the rest of the document. You can
also ask the remote server for ONLY the headers by using the -I option (which
will make curl issue a HEAD request).
 
4. Forms
 
Forms are the general way a web site can present a HTML page with fields for
the user to enter data in, and then press some kind of 'OK' or 'submit'
button to get that data sent to the server. The server then typically uses
the posted data to decide how to act. Like using the entered words to search
in a database, or to add the info in a bug track system, display the entered
address on a map or using the info as a login-prompt verifying that the user
is allowed to see what it is about to see.
 
Of course there has to be some kind of program in the server end to receive
the data you send. You cannot just invent something out of the air.
 
4.1 GET
 
A GET-form uses the method GET, as specified in HTML like:
 




 
In your favorite browser, this form will appear with a text box to fill in
and a press-button labeled "OK". If you fill in '1905' and press the OK
button, your browser will then create a new URL to get for you. The URL will
get "junk.cgi?birthyear=1905&press=OK" appended to the path part of the
previous URL.
 
If the original form was seen on the page "",
the second page you'll get will become
"".
 
Most search engines work this way.
 
To make curl do the GET form post for you, just enter the expected created
URL:
 
curl ""
 
4.2 POST
 
The GET method makes all input field names get displayed in the URL field of
your browser. That's generally a good thing when you want to be able to
bookmark that page with your given data, but it is an obvious disadvantage
if you entered secret information in one of the fields or if there are a
large amount of fields creating a very long and unreadable URL.
 
The HTTP protocol then offers the POST method. This way the client sends the
data separated from the URL and thus you won't see any of it in the URL
address field.
 
The form would look very similar to the previous one:
 




 
And to use curl to post this form with the same data filled in as before, we
could do it like:
 
curl -d "birthyear=1905&press=%20OK%20"
 
This kind of POST will use the Content-Type
application/x-www-form-urlencoded and is the most widely used POST kind.
 
The data you send to the server MUST already be properly encoded, curl will
not do that for you. For example, if you want the data to contain a space,
you need to replace that space with %20 etc. Failing to comply with this
will most likely cause your data to be received wrongly and messed up.
 
4.3 File Upload POST
 
Back in late 1995 they defined an additional way to post data over HTTP. It
is documented in the RFC 1867, why this method sometimes is referred to as
RFC1867-posting.
 
This method is mainly designed to better support file uploads. A form that
allows a user to upload a file could be written like this in HTML:
 




 
This clearly shows that the Content-Type about to be sent is
multipart/form-data.
 
To post to a form like this with curl, you enter a command line like:
 
curl -F upload=@localfilename -F press=OK [URL]
 
4.4 Hidden Fields
 
A very common way for HTML based application to pass state information
between pages is to add hidden fields to the forms. Hidden fields are
already filled in, they aren't displayed to the user and they get passed
along just as all the other fields.
 
A similar example form with one visible field, one hidden field and one
submit button could look like:
 





 
To post this with curl, you won't have to think about if the fields are
hidden or not. To curl they're all the same:
 
curl -d "birthyear=1905&press=OK&person=daniel" [URL]
 
4.5 Figure Out What A POST Looks Like
 
When you're about fill in a form and send to a server by using curl instead
of a browser, you're of course very interested in sending a POST exactly the
way your browser does.
 
An easy way to get to see this, is to save the HTML page with the form on
your local disk, modify the 'method' to a GET, and press the submit button
(you could also change the action URL if you want to).
 
You will then clearly see the data get appended to the URL, separated with a
'?'-letter as GET forms are supposed to.
 
5. PUT
 
The perhaps best way to upload data to a HTTP server is to use PUT. Then
again, this of course requires that someone put a program or script on the
server end that knows how to receive a HTTP PUT stream.
 
Put a file to a HTTP server with curl:
 
curl -T uploadfile
 
6. Authentication
 
Authentication is the ability to tell the server your username and password
so that it can verify that you're allowed to do the request you're doing. The
Basic authentication used in HTTP (which is the type curl uses by default) is
*plain* *text* based, which means it sends username and password only
slightly obfuscated, but still fully readable by anyone that sniffs on the
network between you and the remote server.
 
To tell curl to use a user and password for authentication:
 
curl -u name:password
 
The site might require a different authentication method (check the headers
returned by the server), and then --ntlm, --digest, --negotiate or even
--anyauth might be options that suit you.
Sometimes your HTTP access is only available through the use of a HTTP
proxy. This seems to be especially common at various companies. A HTTP proxy
may require its own user and password to allow the client to get through to
the Internet. To specify those with curl, run something like:
 
curl -U proxyuser:proxypassword curl.haxx.se
 
If your proxy requires the authentication to be done using the NTLM method,
use --proxy-ntlm, if it requires Digest use --proxy-digest.
 
If you use any one these user+password options but leave out the password
part, curl will prompt for the password interactively.
 
Do note that when a program is run, its parameters might be possible to see
when listing the running processes of the system. Thus, other users may be
able to watch your passwords if you pass them as plain command line
options. There are ways to circumvent this.
 
7. Referer
 
A HTTP request may include a 'referer' field (yes it is misspelled), which
can be used to tell from which URL the client got to this particular
resource. Some programs/scripts check the referer field of requests to verify
that this wasn't arriving from an external site or an unknown page. While
this is a stupid way to check something so easily forged, many scripts still
do it. Using curl, you can put anything you want in the referer-field and
thus more easily be able to fool the server into serving your request.
 
Use curl to set the referer field with:
 
curl -e http://curl.haxx.se daniel.haxx.se
 
8. User Agent
 
Very similar to the referer field, all HTTP requests may set the User-Agent
field. It names what user agent (client) that is being used. Many
applications use this information to decide how to display pages. Silly web
programmers try to make different pages for users of different browsers to
make them look the best possible for their particular browsers. They usually
also do different kinds of javascript, vbscript etc.
 
At times, you will see that getting a page with curl will not return the same
page that you see when getting the page with your browser. Then you know it
is time to set the User Agent field to fool the server into thinking you're
one of those browsers.
 
To make curl look like Internet Explorer on a Windows 2000 box:
 
curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL]
 
Or why not look like you're using Netscape 4.73 on a Linux (PIII) box:
 
curl -A "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
 
9. Redirects
 
When a resource is requested from a server, the reply from the server may
include a hint about where the browser should go next to find this page, or a
new page keeping newly generated output. The header that tells the browser
to redirect is Location:.
 
Curl does not follow Location: headers by default, but will simply display
such pages in the same manner it display all HTTP replies. It does however
feature an option that will make it attempt to follow the Location: pointers.
 
To tell curl to follow a Location:
curl -L
 
If you use curl to POST to a site that immediately redirects you to another
page, you can safely use -L and -d/-F together. Curl will only use POST in
the first request, and then revert to GET in the following operations.
 
10. Cookies
 
The way the web browsers do "client side state control" is by using
cookies. Cookies are just names with associated contents. The cookies are
sent to the client by the server. The server tells the client for what path
and host name it wants the cookie sent back, and it also sends an expiration
date and a few more properties.
 
When a client communicates with a server with a name and path as previously
specified in a received cookie, the client sends back the cookies and their
contents to the server, unless of course they are expired.
 
Many applications and servers use this method to connect a series of requests
into a single logical session. To be able to use curl in such occasions, we
must be able to record and send back cookies the way the web application
expects them. The same way browsers deal with them.
 
The simplest way to send a few cookies to the server when getting a page with
curl is to add them on the command line like:
 
curl -b "name=Daniel"
 
Cookies are sent as common HTTP headers. This is practical as it allows curl
to record cookies simply by recording headers. Record cookies with curl by
using the -D option like:
 
curl -D headers_and_cookies
 
(Take note that the -c option described below is a better way to store
cookies.)
 
Curl has a full blown cookie parsing engine built-in that comes to use if you
want to reconnect to a server and use cookies that were stored from a
previous connection (or handicrafted manually to fool the server into
believing you had a previous connection). To use previously stored cookies,
you run curl like:
 
curl -b stored_cookies_in_file
 
Curl's "cookie engine" gets enabled when you use the -b option. If you only
want curl to understand received cookies, use -b with a file that doesn't
exist. Example, if you want to let curl understand cookies from a page and
follow a location (and thus possibly send back cookies it received), you can
invoke it like:
 
curl -b nada -L
 
Curl has the ability to read and write cookie files that use the same file
format that Netscape and Mozilla do. It is a convenient way to share cookies
between browsers and automatic scripts. The -b switch automatically detects
if a given file is such a cookie file and parses it, and by using the
-c/--cookie-jar option you'll make curl write a new cookie file at the end of
an operation:
 
curl -b cookies.txt -c newcookies.txt
 
11. HTTPS
 
There are a few ways to do secure HTTP transfers. The by far most common
protocol for doing this is what is generally known as HTTPS, HTTP over
SSL. SSL encrypts all the data that is sent and received over the network and
thus makes it harder for attackers to spy on sensitive information.
 
SSL (or TLS as the latest version of the standard is called) offers a
truckload of advanced features to allow all those encryptions and key
infrastructure mechanisms encrypted HTTP requires.
 
Curl supports encrypted fetches thanks to the freely available OpenSSL
libraries. To get a page from a HTTPS server, simply run curl like:
 
curl https://that.secure.server.com
 
11.1 Certificates
 
In the HTTPS world, you use certificates to validate that you are the one
you you claim to be, as an addition to normal passwords. Curl supports
client-side certificates. All certificates are locked with a pass phrase,
which you need to enter before the certificate can be used by curl. The pass
phrase can be specified on the command line or if not, entered interactively
when curl queries for it. Use a certificate with curl on a HTTPS server
like:
 
curl -E mycert.pem https://that.secure.server.com
 
curl also tries to verify that the server is who it claims to be, by
verifying the server's certificate against a locally stored CA cert
bundle. Failing the verification will cause curl to deny the connection. You
must then use -k in case you want to tell curl to ignore that the server
can't be verified.
 
More about server certificate verification and ca cert bundles can be read
in the SSLCERTS document, available online here:
 
http://curl.haxx.se/docs/sslcerts.html
 
12. Custom Request Elements
 
Doing fancy stuff, you may need to add or change elements of a single curl
request.
 
For example, you can change the POST request to a PROPFIND and send the data
as "Content-Type: text/xml" (instead of the default Content-Type) like this:
 
curl -d "" -H "Content-Type: text/xml" -X PROPFIND url.com
 
You can delete a default header by providing one without content. Like you
can ruin the request by chopping off the Host: header:
 
curl -H "Host:"
 
You can add headers the same way. Your server may want a "Destination:"
header, and you can add it:
 
curl -H "Destination: "
 
13. Debug
 
Many times when you run curl on a site, you'll notice that the site doesn't
seem to respond the same way to your curl requests as it does to your
browser's.
 
Then you need to start making your curl requests more similar to your
browser's requests:
 
* Use the --trace-ascii option to store fully detailed logs of the requests
for easier analyzing and better understanding
 
* Make sure you check for and use cookies when needed (both reading with -b
and writing with -c)
 
* Set user-agent to one like a recent popular browser does
 
* Set referer like it is set by the browser
 
* If you use POST, make sure you send all the fields and in the same order as
the browser does it. (See chapter 4.5 above)
 
A very good helper to make sure you do this right, is the LiveHTTPHeader tool
that lets you view all headers you send and receive with Mozilla/Firefox
(even when using HTTPS).
 
A more raw approach is to capture the HTTP traffic on the network with tools
such as ethereal or tcpdump and check what headers that were sent and
received by the browser. (HTTPS makes this technique inefficient.)
 
14. References
 
RFC 2616 is a must to read if you want in-depth understanding of the HTTP
protocol.
 
RFC 2396 explains the URL syntax.
 
RFC 2109 defines how cookies are supposed to work.
 
RFC 1867 defines the HTTP post upload format.
 
is the home of the OpenSSL project
 
http://curl.haxx.se is the home of the cURL project
一个强大的文件下载工具curl(zz)
一个强大的文件下载工具curl

curl是一个利用URL语法在命令行方式下工作的文件传输工具。本文介绍了它的简单用法。

curl是一个利用URL语法在命令行方式下工作的文件传输工具。它支持很多协议:FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE 以及 LDAP。curl同样支持HTTPS认证,HTTP POST方法, HTTP PUT方法, FTP上传, kerberos认证, HTTP上传, 代理服务器, cookies, 用户名/密码认证, 下载文件断点续传, 上载文件断点续传, http代理服务器管道( proxy tunneling), 甚至它还支持IPv6, socks5代理服务器, 通过http代理服务器上传文件到FTP服务器等等,功能十分强大。Windows操作系统下的网络蚂蚁,网际快车(FlashGet)的功能它都可以做到。准确的说,curl支持文件的上传和下载,所以是一个综合传输工具,但是按照传统,用户习惯称curl为下载工具。
curl是瑞典curl组织开发的,您可以访问http://curl.haxx.se/获取它的源代码和相关说明。鉴于curl在Linux上的广泛使用,IBM在AIX Linux Toolbox的光盘中包含了这个软件,并且您可以访问IBM网站 1.ibm.com/servers/aix/products/aixos/linux/altlic.html下载它。curl的最新版本是 7.10.8,IBM网站上提供的版本为7.9.3。在AIX下的安装很简单,IBM网站上下载的rpm格式的包。
在http://curl.haxx.se/docs/,您可以下载到UNIX格式的man帮助,里面有详细的curl工具的使用说明。curl的用法为:curl [options] [URL...] 其中options是下载需要的参数,大约有80多个,curl的各个功能完全是依靠这些参数完成的。具体参数的使用,用户可以参考curl的man帮助。
下面,本文就将结合具体的例子来说明怎样利用curl进行下载。
1、获得一张页面
使用命令:curl http://curl.haxx.se
这是最简单的使用方法。用这个命令获得了http://curl.haxx.se指向的页面,同样,如果这里的URL指向的是一个文件或者一幅图都可以直接下载到本地。如果下载的是HTML文档,那么缺省的将不显示文件头部,即HTML文档的header。要全部显示,请加参数 -i,要只显示头部,用参数 -I。任何时候,可以使用 -v 命令看curl是怎样工作的,它向服务器发送的所有命令都会显示出来。为了断点续传,可以使用-r参数来指定传输范围。
2、表单(Form)的获取
在WEB页面设计中,form是很重要的元素。Form通常用来收集并向网站提交信息。提交信息的方法有两种,GET方法和POST方法。先讨论GET方法,例如在页面中有这样一段:




那么浏览器上会出现一个文本框和一个标为“OK”的按钮。按下这个按钮,表单就用GET方法向服务器提交文本框的数据。例如原始页面是在 看到的,然后您在文本框中输入1905,然后按OK按钮,那么浏览器的URL现在应该是:“”
对于这种网页,curl可以直接处理,例如想获取上面的网页,只要输入:
curl ""
就可以了。
表单用来提交信息的第二种方法叫做POST方法,POST方法和GET方法的区别在于GET方法使用的时候,浏览器中会产生目标URL,而POST不会。类似GET,这里有一个网页:




浏览器上也会出现一个文本框和一个标为“OK”的按钮。按下这个按钮,表单用POST方法向服务器提交数据。这时的URL是看不到的,因此需要使用特殊的方法来抓取这个页面:
curl -d "birthyear=1905&press=OK"
这个命令就可以做到。
1995年年末,RFC 1867定义了一种新的POST方法,用来上传文件。主要用于把本地文件上传到服务器。此时页面是这样写的:




对于这种页面,curl的用法不同:
curl -F upload=@localfilename -F press=OK [URL]
这个命令的实质是将本地的文件用POST上传到服务器。有关POST还有不少用法,用户可以自己摸索。
3、使用PUT方法。
HTTP协议文件上传的标准方法是使用PUT,此时curl命令使用-T参数:
curl -T uploadfile
4、有关认证。
curl可以处理各种情况的认证页面,例如下载用户名/密码认证方式的页面(在IE中通常是出现一个输入用户名和密码的输入框):
curl -u name:password
如果网络是通过http代理服务器出去的,而代理服务器需要用户名和密码,那么输入:
curl -U proxyuser:proxypassword http://curl.haxx.se
任何需要输入用户名和密码的时候,只在参数中指定用户名而空着密码,curl可以交互式的让用户输入密码。
5、引用。
有些网络资源访问的时候必须经过另外一个网络地址跳转过去,这用术语来说是:referer,引用。对于这种地址的资源,curl也可以下载:
curl -e http://curl.haxx.se daniel.haxx.se
6、指定用户客户端。
有些网络资源首先需要判断用户使用的是什么浏览器,符合标准了才能够下载或者浏览。此时curl可以把自己“伪装”成任何其他浏览器:
curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL]
这个指令表示curl伪装成了IE5.0,用户平台是Windows 2000。(对方服务器是根据这个字串来判断客户端的类型的,所以即使使用AIX也无所谓)。使用:
curl -A "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
此时curl变成了Netscape,运行在PIII平台的Linux上了。
7、COOKIES
Cookie是服务器经常使用的一种记忆客户信息的方法。如果cookie被记录在了文件中,那么使用命令:
curl -b stored_cookies_in_file
curl可以根据旧的cookie写出新cookie并发送到网站:
curl -b cookies.txt -c newcookies.txt
8、加密的HTTP——HTTPS。
如果是通过OpenSSL加密的https协议传输的网页,curl可以直接访问:
curl https://that.secure.server.com
9、http认证。
如果是采用证书认证的http地址,证书在本地,那么curl这样使用:
curl -E mycert.pem https://that.secure.server.com

参考读物和注意事项:curl非常博大,用户要想使用好这个工具,除了详细学习参数之外,还需要深刻理解http的各种协议与URL的各个语法。这里推荐几个读物:
RFC 2616 HTTP协议语法的定义。
RFC 2396 URL语法的定义。
RFC 2109 Cookie是怎样工作的。
RFC 1867 HTTP如何POST,以及POST的格式。
curl是免费软件,IBM公司对curl不提供技术支持。
阅读(5396) | 评论(1) | 转发(0) |
0

上一篇:samba

下一篇:ELinks 支持中文的方法- -

给主人留下些什么吧!~~