分类:
2009-04-25 07:21:41
Using cURL is a simple and effective way to gather data from another website, run it through a script, parse the data and transform it into something useful that you can use on your website. Whether you are “scraping” data to build a summary of a link, pulling an XML file to parse into a database, or just simply wanting to get the contents of the file, cURL is a simple and effective way to pull the data from an outside source into your page.
Making sure cURL is enabled and setup
First things first, you need to make sure cURL is enabled on your web host. The easiest way to accomplish this is to check your phpinfo on your server. Simply deploy a PHP file with the following information onto your server, and name it whatever you want.
phpinfo(); ?>
After the file is uploaded/saved onto your web server, look through the file to ensure that there is a section that looks as follows.
If your PHP file doesn’t have this section of code, or nothing similar to it, then your hosting service may not support cURL, or it may not be enabled. If you are on a hosting service, you can ask your host to enable it for you, or if you are on your own server, you can modify your php.ini file to enable the extension.
You can modify your php.ini file as follows:
(if you can’t find it, look at the top of the script we wrote above, it will give you the ini path)
// Find this line in your php.ini ;extension=php_curl.dll // Remove the semi-colon in front, to make the line look like this: extension=php_curl.dll
After modifying and saving your php.ini file, you are going to have to restart your web service.
- If you are running on Apache, you should be able to enable it with a simple “apachectl restart” command.
- If you are running an IIS web server, you are going to have to restart IIS or just restart the Worker Pool that is running your PHP. This can be done through the MMC IIS Snap-In.
- If you are running WAMP on your local machine, simply right-click on the WAMP icon in your system tray, find the Apache menu, and click “Restart”.
Just make sure you go back into your file running phpinfo() to ensure that cURL is showing up in the file now. If not, you may want to seek addition support from your IT, Co-workers or Web hosting provider for more information as to why cURL will not function on your server.
Assuming everything is running now, and cURL is enabled, we will continue onwards.
A simple cURL Request
cURL isn’t incredibly hard to use to pull the data in, as illustrated below.
The above set of code will go out to and will set the variable $data to contain the HTML contents of the website. The var_dump($data) at the end of the file merely spits it back out onto your screen so you can see the data you have to work with.
Now, what you end up doing with this data is up to you! You could run it through some regex statements to pull relevant information, you could parse it line by line and store certain portions of code somewhere, or if you are pulling an XML file, you could begin to parse the XML. Since this article is just about cURL, we won’t get into that.
Using a cURL Request Object
A bit more on the advanced side, but if you want to create an object to handle all your requests for you, I’ve pulled one out of my code library that you may find useful.
- class curlHandler {
- public $url = '';
- public $output = '';
- public $curl = '';
- function __construct($url) {
- $this->curl = curl_init();
- $this->url($url);
- curl_setopt($this->curl, CURLOPT_URL, $this->url);
- curl_setopt($this->curl, CURLOPT_RETURNTRANSFER, true);
- curl_setopt($this->curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');
- $this->output = curl_exec($this->curl);
- return $this->output;
- }
- function __destruct() {
- curl_close($this->curl);
- }
- function url($url) {
- $this->url = $url;
- curl_setopt($this->curl, CURLOPT_URL, $url);
- }
- }
- // Init the Object and do the Request, as well as close down the handler afterwards
- $curlHandler = new curlHandler("");
- // Display what we've found
- var_dump($curlHandler);
curl = curl_init(); $this->url($url); curl_setopt($this->curl, CURLOPT_URL, $this->url); curl_setopt($this->curl, CURLOPT_RETURNTRANSFER, true); curl_setopt($this->curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2'); $this->output = curl_exec($this->curl); return $this->output; } function __destruct() { curl_close($this->curl); } function url($url) { $this->url = $url; curl_setopt($this->curl, CURLOPT_URL, $url); } } // Init the Object and do the Request, as well as close down the handler afterwards $curlHandler = new curlHandler(""); // Display what we've found var_dump($curlHandler);Well, gathering data this way is pretty simple when you know what you are passing in. Notice above in my class, that I am passing a Firefox browser string into the cURL request. Why is this? Well some websites try to block cURL or automated requests (such as the World of Warcraft Armory, which is what I was scraping), so by mimicking a browser, we can get past these obstacles.
Now what you do with all of this new found data, well that is up to you. Eventually I will create a post more about parsing this data you find, but that is for another day.