What information do python crawlers generally crawl?-Python Tutorial-php.cn

What information do python crawlers generally crawl?

藏色散人

Release： 2019-07-04 09:20:44

Original

4210 people have browsed it

What information do python crawlers generally crawl?

Generally when talking about crawlers, most programmers will subconsciously think of Python crawlers. Why is this so? I think there are two reasons:

1. The Python ecosystem is extremely rich , third-party libraries such as Request, Beautiful Soup, Scrapy, PySpider, etc. are really powerful

2. Python syntax is simple and easy to use, and you can write a crawler in minutes (some people complain that Python is slow, but the bottleneck of the crawler and language Not relevant)

A crawler is a program. The purpose of this program is to capture information resources on the World Wide Web. For example, search engines such as Google that you use daily, the search results all rely on the crawler to obtain them regularly

Looking at the above search results, in addition to wiki-related introductions, all crawler-related search results include Python. Previous people said that Python crawlers are true, and now it seems that they are honest~

The target target of crawlers is also It is very rich. Whether it is text, pictures, videos, or any structured and unstructured data crawlers can crawl it. After the development of crawlers, various crawler types have also been derived:

● General web crawlers: crawlers Expanding the retrieval objects from some seed URLs to the entire Web, this is what search engines do

● Vertical web crawler: Crawling topics in specific fields, such as vertical crawlers that specifically crawl novel directories and chapters

● Incremental web crawler: perform real-time updates on crawled web pages

● Deep web crawler: crawl some web pages that require users to submit keywords to obtain

I don’t want to To talk about these general concepts, let us take obtaining web content as an example. Starting from the crawler technology itself, let's talk about web crawlers. The steps are as follows:

Simulate requesting web resources

From HTML Extract target elements

Data persistence

Related recommendations: "Python Tutorial"

The above is the detailed content of What information do python crawlers generally crawl?. For more information, please follow other related articles on the PHP Chinese website!