How to install php crawler framework

爱喝马黛茶的安东尼
Release: 2023-02-25 16:14:02
Original
2265 people have browsed it

How to install php crawler framework

When it comes to making a crawler, the first thing that everyone may think of is Python. In fact, PHP can also be used to write crawler programs. PHP has always been simple and easy to use. I personally tested that I can write a simple crawler program in 10 minutes using the PHPspider framework.

1. PHP environment installation

Like python, PHP also needs an environment. You can use PHP downloaded from the official website, or you can use XAMPP, PHPstudy and other integrated environments. PHP. An integrated environment is recommended, eliminating the need to install the Mysql database separately.

2. Composer installation

composer is a dependency package management tool under PHP, similar to PIP in Python.

The Chinese official website is https://www.phpcomposer.com/

. Just download and install it. Run cmd in win R and enter the composer command. If the following picture appears, the installation is successful.

How to install php crawler framework

3. PHPspider installation

Create a folder in any location. For example, if we want to capture the data of Jianshu, we You can create the jianshu folder on the D drive, then enter the folder with the cmd command, and run the command:

composer require owner888/phpspider
Copy after login

The following result is a successful installation.

How to install php crawler framework

Related recommendations: "php environment construction"

4. Start writing the first crawler

Now open the jianshu folder and you will find that there are some more things in it. Don't worry about it. Create a php file and start coding.

How to install php crawler framework

The development documentation is here: https://doc.phpspider.org/demo-start.html

I won’t talk about the basics here, just go to the code. , because we are doing a 10-minute quick tutorial.

The matching method uses XPach syntax.

 '简书',
'log_show' =>false,
'tasknum' => 1,
//数据库配置
'db_config' => array(
'host'  => '127.0.0.1',
'port'  => 3306,
'user'  => 'root',
'pass'  => '',
'name'  => 'demo',
),
'export' => array(
'type' => 'db',
'table' => 'jianshu',  // 如果数据表没有数据新增请检查表结构和字段名是否匹配
),
//爬取的域名列表  
'domains' => array(
    'jianshu',
    'www.jianshu.com'
), 
//抓取的起点
'scan_urls' => array(
    'https://www.jianshu.com/c/V2CqjW?utm_medium=index-collections&utm_source=desktop'
),
//列表页实例
'list_url_regexes' => array(
    "https://www.jianshu.com/c/\d+"
),
//内容页实例
//  \d+  指的是变量
'content_url_regexes' => array(
    "https://www.jianshu.com/p/\d+",
),
'max_try' => 5,
'fields' => array(
    array(
        'name' => "title",
        'selector' => "//h1[@class='title']",
        'required' => true,
    ),
    array(
        'name' => "content",
        'selector' => "//div[@class='show-content-free']",
        'required' => true,
    ),
),
);
$spider = new phpspider($configs);
$spider->start();
Copy after login

Let’s explain the meaning of the syntax a little bit:

//h1[@class='title']
Copy after login

Get all h1 nodes with class value of title

//div[@class='show-content-free']
Copy after login

Get all divs with class value of show-content-free After finishing the code for node

, remember to create the corresponding database and data table according to the content to be captured, and the fields must be aligned.

How to install php crawler framework

Then cmd, enter:

php -f d:\jianshu\spider.php
Copy after login

Run as follows:

How to install php crawler framework

How to install php crawler framework

Open the data and take a look. Have you captured everything?

How to install php crawler framework

The above is the detailed content of How to install php crawler framework. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!