What does simple mean? PHP simple_html_domphp+regular collection of article codes

WBOY
Release: 2016-07-29 08:41:32
Original
1045 people have browsed it

Copy the code The code is as follows:


//Include PHP Simple html Dom class library file
include_once('./simplehtmldom/simple_html_dom.php');
//Collect html
function getwebcontent ($url){
$ch = curl_init();
$timeout = 10;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
$contents = trim(curl_exec($ch));
curl_close($ch);
return $contents;
}
//Get title and url
$string =
getwebcontent('http://www.babytree.com/learn/zhunbeihuaiyun/jijibeiyun/2');
//Regular matching

  • Get title and address
    preg_match_all ("/
  • < ;a href="/learn/article/(.*)">(.*)/",
    $string, $out, PREG_SET_ORDER);
    foreach($out as $key => $value){
    $article['title'][] = $out[$key][2];
    $article['link'][] = "http://www.babytree.com/learn/article /".$out[$key][1];
    }
    //Get the article content based on the url
    foreach($article['link'] as $key=>$value){
    $html = file_get_html($ value);
    $div = $html->find('div[id=pagenum_0]');
    $article[content][] = $div[0]->innertext;
    }
    //Title transfer Code---you don't need this step when you actually use it--because we have to use utf8 in the first place
    //It really can't be saved as a file without transcoding
    foreach($article[title] as $key=>$value) {
    $article[title][$key] = iconv('utf-8', 'gbk', $value);//Transcoding
    }
    //Save to file
    $num = count($article[' title']);
    for($i=0; $i<$num; $i++){
    file_put_contents("{$article[title][$i]}.txt", $article['content'][ $i]);
    }
    /*I originally wanted to send it before 12 o'clock. . But if you look down upon it, it’s already 3:30. . . Even if it was yesterday
    Originally, using regular expressions is the best and fastest way to obtain article content.
    Regular expressions are good, but regular expressions are really difficult! So I did some research and found that
    Many people on the Internet are also using PHP Simple Dom. Although the efficiency is a bit slower, the effect is still good
    From including the class library file to writing the txt file, it takes about 7/8 seconds and there are For further optimization, especially the regular rules for obtaining article content, that is too disgusting
    You can do a little research*/
    ?>

    The above introduces what simple means. PHP simple_html_domphp+regular collects article codes, including what simple means. I hope it will be helpful to friends who are interested in PHP tutorials.

  • Related labels:
    source:php.cn
    Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
    Popular Tutorials
    More>
    Latest Downloads
    More>
    Web Effects
    Website Source Code
    Website Materials
    Front End Template
    About us Disclaimer Sitemap
    php.cn:Public welfare online PHP training,Help PHP learners grow quickly!