phpspider An excellent PHP development spider crawler

##To write a PHP web crawler, you need to have the following skills:
The crawler is written in PHP (recommended learning:PHP video tutorial)
Extracting data from web pages requires XPath (XPath selector tutorial)Of course we can also use CSS selectors (CSS selector tutorial)Regular expressions (regular expression tutorial) are used in many casesChrome’s developer tools are artifacts , many AJAX requests need to be analyzed using itNote:This framework can only be run under the command line, command line, command line, command line, important things must be said three times^_ ^
The demo written in this article is to crawl the military education website
<?php
require_once __DIR__ . '/../autoloader.php';
use phpspider\core\phpspider;
/* Do NOT delete this comment */
/* 不要删除这段注释 */
$configs = array(
'name' => '军事', // 给你的爬虫起一个名字
'log_show' => false, // 是否显示日志
'tasknum' => 1, // 开启多少个进程爬取
// 数据库配置
'db_config' => array(
'host' => '127.0.0.1',
'port' => 3306,
'user' => 'root',
'pass' => 'root',
'name' => 'collection',
),
// 数据库表,表需要已存在,collection库,test表
'export' => array(
'type' => 'db',
'table' => 'test',
),
// 爬取的域名列表
'domains' => array(
'war.163.com'
),
// 抓取的起点
'scan_urls' => array(
'http://war.163.com'
),
// 列表页实例,你要爬取的列表,也就是分页
'list_url_regexes' => array(
"http://war.163.com"
),
// 内容页实例,文章的内容页
// \d+ 指的是变量,就是可变的参数
'content_url_regexes' => array(
"http://war.163.com/photoview/4T8E0001/\d+",
),
// 失败重新爬取次数
'max_try' => 5,
// 爬取规则配置
'fields' => array(
array(
'name' => "title", // 数据库字段名
'selector' => "//div[@class='headline']/h1", // 规则,表示:headline类里的h1标签
'required' => true, // 如果为空,整条数据丢弃
),
array(
'name' => "content",
'selector' => "//div[@class='overview']/p",
'required' => true,
),
array(
'name' => "img",
'selector' => "//img[@class='firstPreload']",
'required' => true,
),
),
);
$spider = new phpspider($configs);
$spider->start();The above is the detailed content of Can php be used as a crawler?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Zend Studio 13.0.1
Powerful PHP integrated development environment






