Home > Backend Development > PHP Tutorial > PHP grabs [related search terms] from Baidu search results page and stores them

PHP grabs [related search terms] from Baidu search results page and stores them

不言
Release: 2023-03-24 22:18:01
Original
3570 people have browsed it

The content of this article is about PHP crawling and storing the [related search terms] of Baidu search results page. It has a certain reference value. Now I share it with everyone. Friends in need can refer to it

1. Search keywords on Baidu [Chiliao Shell Company Transfer]

PHP grabs [related search terms] from Baidu search results page and stores them

[Cichid Shell Company Transfer] Search Link
https:/ /www.baidu.com/s?wd= Transfer of Cicada Shell Company

PHP grabs [related search terms] from Baidu search results page and stores them

**搜索结果部分源代码**

<p id="rs"><p class="tt">相关搜索</p><table cellpadding="0"><tbody><tr><th>
<a href="/s?wd=%E5%85%AC%E5%8F%B8%E8%BD%AC%E8%AE%A9%E6%B5%81%E7%A8%8B%E7%9F%A5%E4%BA%86%E5%A3%B3&rsp=0&f=1&oq=%E7%9F%A5%E4%BA%86%E5%A3%B3%E5%85%AC%E5%8F%B8%E8%BD%AC%E8%AE%A9&tn=baiduhome_pg&ie=utf-8&rsv_idx=2&rsv_pq=88c7804a0000c417&rsv_t=b5f3JkJIsj6Nkp61K%2BmmVGeev7UP95o1HSJTUoIS2xV4SsmZxvOoVf%2BAZaVoihB%2BT3bg&rqlang=cn&rsv_ers=xn0&rs_src=0&rsv_pq=88c7804a0000c417&rsv_t=b5f3JkJIsj6Nkp61K%2BmmVGeev7UP95o1HSJTUoIS2xV4SsmZxvOoVf%2BAZaVoihB%2BT3bg">公司转让流程知了壳</a></th>
.....
.....
<th><a href="/s?wd=%E7%9F%A5%E4%BA%86%E5%A3%B3%E5%85%AC%E5%8F%B8%E6%B3%A8%E5%86%8C&rsp=8&f=1&oq=%E7%9F%A5%E4%BA%86%E5%A3%B3%E5%85%AC%E5%8F%B8%E8%BD%AC%E8%AE%A9&tn=baiduhome_pg&ie=utf-8&rsv_idx=2&rsv_pq=88c7804a0000c417&rsv_t=b5f3JkJIsj6Nkp61K%2BmmVGeev7UP95o1HSJTUoIS2xV4SsmZxvOoVf%2BAZaVoihB%2BT3bg&rqlang=cn&rsv_ers=xn0&rs_src=0&rsv_pq=88c7804a0000c417&rsv_t=b5f3JkJIsj6Nkp61K%2BmmVGeev7UP95o1HSJTUoIS2xV4SsmZxvOoVf%2BAZaVoihB%2BT3bg">知了壳公司注册</a></th></tr></tbody></table></p>
Copy after login


2. Grab and save Local

PHP grabs [related search terms] from Baidu search results page and stores them

Source code

index.php------------

<form action="index.php" method="post">
<input name="q" type="text" />
<input type="submit" value="Get Keywords" />
</form>

<?php
header(&#39;Content-Type:text/html;charset=gbk&#39;);
class ComBaike{
    private $o_String=NULL;
    public function __construct(){
        include(&#39;cls.StringEx.php&#39;);
        $this->o_String=new StringEx();
    }
    public function getItem($word){
        $url = "http://www.baidu.com/s?wd=".$word;
        // 构造包头,模拟浏览器请求
        $header = array (
            "Host:www.baidu.com",
            "Content-Type:application/x-www-form-urlencoded",//post请求
            "Connection: keep-alive",
            &#39;Referer:http://www.baidu.com&#39;,
            &#39;User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; BIDUBrowser 2.6)&#39;
        );
        $ch = curl_init ();
        curl_setopt ( $ch, CURLOPT_URL, $url );
        curl_setopt ( $ch, CURLOPT_HTTPHEADER, $header );
        curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
        $content = curl_exec ( $ch );
        if ($content == FALSE) {
        echo "error:" . curl_error ( $ch );
        }
        curl_close ( $ch );
        //输出结果echo $content;
        $this->o_String->string=$content;
        $s_begin=&#39;<p id="rs">&#39;;
        $s_end=&#39;</p>&#39;;
        $summary=$this->o_String->getPart($s_begin,$s_end);
        $s_begin=&#39;<p class="tt">相关搜索</p><table cellpadding="0"><tr><th>&#39;;
        $s_end=&#39;</th></tr></table></p>&#39;;
        $content=$this->o_String->getPart($s_begin,$s_end);
        return $content;
    }
    public function __destruct(){
        unset($this->o_String);    
    }
}

if($_POST){

    $com = new ComBaike();
    $q = $_POST[&#39;q&#39;];
    $str = $com->getItem($q); //获取搜索内容
    $pat = &#39;/<a(.*?)href="(.*?)"(.*?)>(.*?)<\/a>/i&#39;;     
    preg_match_all($pat, $str, $m);    
    //print_r($m[4]); 链接文字
    $con = implode(",", $m[4]);
    //生成文件夹
    $dates = date("Ymd");
    $path="./Search/".$dates."/";
    if(!is_dir($path)){
        mkdir($path,0777,true); 
    }
    //生成文件
    $file = fopen($path.iconv("UTF-8","GBK",$q).".txt",&#39;w&#39;);
    if(fwrite($file,$con)){
        echo $con;
        echo &#39;<script>alert("success")</script>&#39;;
    }else{
        echo &#39;<script>alert("error")</script>&#39;;
    }
    fclose($file);

}

?>

cls.StringEx.php-------------

<?php
header(&#39;Content-Type: text/html; charset=UTF-8&#39;);
class StringEx{
    public $string=&#39;&#39;;
    public function __construct($string=&#39;&#39;){
        $this->string=$string;
    }
    public function pregGetPart($s_begin,$s_end){
        $s_begin==preg_quote($s_begin);
        $s_begin=str_replace(&#39;/&#39;,&#39;\/&#39;,$s_begin);
        $s_end=preg_quote($s_end);
        $s_end=str_replace(&#39;/&#39;,&#39;\/&#39;,$s_end);
        $pattern=&#39;/&#39;.$s_begin.&#39;(.*?)&#39;.$s_end.&#39;/&#39;;
        $result=preg_match($pattern,$this->string,$a_match);
        if(!$result){
            return $result;
        }else{
            return isset($a_match[1])?$a_match[1]:&#39;&#39;;
        }
    }
    public function strstrGetPart($s_begin,$s_end){
        $string=strstr($this->string,$s_begin);
        $string=strstr($string,$s_end,true);
        $string=str_replace($s_begin,&#39;&#39;,$string);
        $string=str_replace($s_end,&#39;&#39;,$string);
        return $string;
    }
    public function getPart($s_begin,$s_end){
        $result=$this->pregGetPart($s_begin,$s_end);
        if(!$result){
            $result=$this->strstrGetPart($s_begin,$s_end);
        }
        return $result;
    }
}
?>
Copy after login

Related recommendations:

php crawl page Garbled code analysis

php captures web page matching content template




The above is the detailed content of PHP grabs [related search terms] from Baidu search results page and stores them. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template