Home >Backend Development >PHP Problem >Can php be used as a crawler?
phpspider An excellent PHP development spider crawler
##To write a PHP web crawler, you need to have the following skills:
The crawler is written in PHP (recommended learning:PHP video tutorial)
Extracting data from web pages requires XPath (XPath selector tutorial)Of course we can also use CSS selectors (CSS selector tutorial)Regular expressions (regular expression tutorial) are used in many casesChrome’s developer tools are artifacts , many AJAX requests need to be analyzed using itNote:This framework can only be run under the command line, command line, command line, command line, important things must be said three times^_ ^
The demo written in this article is to crawl the military education website
<?php require_once __DIR__ . '/../autoloader.php'; use phpspider\core\phpspider; /* Do NOT delete this comment */ /* 不要删除这段注释 */ $configs = array( 'name' => '军事', // 给你的爬虫起一个名字 'log_show' => false, // 是否显示日志 'tasknum' => 1, // 开启多少个进程爬取 // 数据库配置 'db_config' => array( 'host' => '127.0.0.1', 'port' => 3306, 'user' => 'root', 'pass' => 'root', 'name' => 'collection', ), // 数据库表,表需要已存在,collection库,test表 'export' => array( 'type' => 'db', 'table' => 'test', ), // 爬取的域名列表 'domains' => array( 'war.163.com' ), // 抓取的起点 'scan_urls' => array( 'http://war.163.com' ), // 列表页实例,你要爬取的列表,也就是分页 'list_url_regexes' => array( "http://war.163.com" ), // 内容页实例,文章的内容页 // \d+ 指的是变量,就是可变的参数 'content_url_regexes' => array( "http://war.163.com/photoview/4T8E0001/\d+", ), // 失败重新爬取次数 'max_try' => 5, // 爬取规则配置 'fields' => array( array( 'name' => "title", // 数据库字段名 'selector' => "//div[@class='headline']/h1", // 规则,表示:headline类里的h1标签 'required' => true, // 如果为空,整条数据丢弃 ), array( 'name' => "content", 'selector' => "//div[@class='overview']/p", 'required' => true, ), array( 'name' => "img", 'selector' => "//img[@class='firstPreload']", 'required' => true, ), ), ); $spider = new phpspider($configs); $spider->start();
The above is the detailed content of Can php be used as a crawler?. For more information, please follow other related articles on the PHP Chinese website!