Use PHP method curl to capture AJAX asynchronous content idea analysis and code sharing, curlajax_PHP tutorial

WBOY
Release: 2016-07-13 10:20:02
Original
880 people have browsed it

Use php method curl to capture AJAX asynchronous content idea analysis and code sharing, curlajax

In fact, there is not much difference between grabbing ajax asynchronous content pages and grabbing ordinary pages. Ajax just makes an asynchronous http request. Just use a tool like Firebug to find the requested back-end service URL and the passed value parameters, and then grab the passed parameters of the URL.

Using Firebug’s Network Tools                                                               

If the page is captured, there will be no data displayed in the content, but a bunch of JS code.

Code

$cookie_file=tempnam('./temp','cookie');
$ch = curl_init();
$url1 = "http://www.cdut.edu.cn/default.html";
curl_setopt($ch,CURLOPT_URL,$url1);
curl_setopt($ch,CURLOPT_HTTP_VERSION,CURL_HTTP_VERSION_1_1);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_ENCODING ,'gzip'); //加入gzip解析
//设置连接结束后保存cookie信息的文件
curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file);
$content=curl_exec($ch);

curl_close($ch);

$ch3 = curl_init();
$url3 = "http://www.cdut.edu.cn/xww/dwr/call/plaincall/portalAjax.getNewsXml.dwr";
$curlPost = "callCount=1&page=/xww/type/1000020118.html&httpSessionId=12A9B726E6A2D4D3B09DE7952B2F282C&scriptSessionId=295315B4B4141B09DA888D3A3ADB8FAA658&c0-scriptName=portalAjax&c0-methodName=getNewsXml&c0-id=0&c0-param0=string:10000201&c0-param1=string:1000020118&c0-param2=string:news_&c0-param3=number:5969&c0-param4=number:1&c0-param5=null:null&c0-param6=null:null&batchId=0";
curl_setopt($ch3,CURLOPT_URL,$url3);
curl_setopt($ch3,CURLOPT_POST,1);
curl_setopt($ch3,CURLOPT_POSTFIELDS,$curlPost);

//设置连接结束后保存cookie信息的文件
curl_setopt($ch3,CURLOPT_COOKIEFILE,$cookie_file); 
$content1=curl_exec($ch3);
curl_close($ch3);
Copy after login

php using curl to crawl the content of a website was rejected

Just wrote this. Hope this is useful
>



php curl becomes unresponsive after grabbing ajax data for a period of time

Try to forge header information: Host, Referer, User-Agent, etc.


http://www.bkjia.com/PHPjc/869450.html

truehttp: //www.bkjia.com/PHPjc/869450.htmlTechArticleUsing the php method curl to capture AJAX asynchronous content idea analysis and code sharing, curlajax actually captures the page of ajax asynchronous content and There is not much difference in grabbing ordinary pages. ajax is just asynchronous...
Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!