Use PHP’s cURL library to crawl web pages simply and effectively. You only need to run a script and analyze the web pages you crawled, and then you can get the data you want programmatically. Whether you want to retrieve partial data from a link, take an XML file and import it into a database, or even simply retrieve the content of a web page, cURL is a powerful PHP library. This article mainly describes how to use this PHP library.
Enable cURL settings
First of all, we must first determine whether our PHP has this library enabled. You can get this information by using the php_info() function.
phpinfo();
?>
If you can see the following output on the web page, it means that the cURL library has been enabled.
If you see this, then you need to set up your PHP and enable this library. If you are on the Windows platform, it is very simple. You need to change the settings of your php.ini file, find php_curl.dll, and cancel the previous semicolon comment. As shown below:
//Uncomment the following
extension=php_curl.dll
If you are under Linux, then you need to recompile your PHP, edit , you need to turn on the compilation parameters - add the "-with-curl" parameter to the configure command.
A small example
If everything is ready, here is a small routine:
//Initialize a cURL object
$curl = curl_init();
// Set the URL you need to crawl
curl_setopt($curl, CURLOPT_URL, 'http://cocre.com');
// Set header
curl_setopt($curl, CURLOPT_HEADER, 1);
// Set the cURL parameters to require the result to be saved in a string or output to the screen.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Run cURL and request the web page
$data = curl_exec($curl);
// Close URL request
curl_close($curl);
// Display the obtained data
var_dump($data);
?>
How to POST data
The above is the code to crawl the web page, and the following is to POST data to a web page. Suppose we have a form processing URL http://www.example.com/sendSMS.php, which can accept two form fields, one is a phone number and the other is text message content.
$phoneNumber = '13912345678';
$message = 'This message was generated by curl and php';
$curlPost = 'pNUMBER=' . urlencode( $phoneNumber) . '&MESSAGE=' . urlencode($message) . '&SUBMIT=Send';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example. com/sendSMS.php');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
$data = curl_exec();curl_close($ch);
?>
From the above program we can see that using CURLOPT_POST Set the POST method of the HTTP protocol instead of the GET method, and then set the POST data with CURLOPT_POSTFIELDS.
About proxy server
Here is an example of how to use a proxy server. Please pay attention to the highlighted code. The code is very simple, so I don’t need to say more.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com');
curl_setopt($ch , CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, 'fakeproxy.com:1080') ;
curl_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');
$data = curl_exec();curl_close($ch);
?>
About SSL and Cookies
Regarding SSL, which is the HTTPS protocol, you only need to change http:// in the CURLOPT_URL connection to https://. Of course, there is also a parameter called CURLOPT_SSL_VERIFYHOST that can be set to verify the site.
About cookies, you need to know the following three parameters:
CURLOPT_COOKIE, set a cookie in the face-to-face session
CURLOPT_COOKIEJAR, save a cookie when the session ends
CURLOPT_COOKIEFILE, cookie file.
HTTP server authentication
Finally, let’s take a look at HTTP server authentication.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com');
curl_setopt($ch , CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt(CURLOPT_USERPWD, '[username]:[password]')
$data = curl_exec();
curl_close($ch);
?>
For more information, please refer to the relevant cURL manual.