Home > Article > Backend Development > What information do python crawlers generally crawl?

What information do python crawlers generally crawl?

藏色散人Original: 2019-07-04 09:20:444047browse

What information do python crawlers generally crawl?

Generally when talking about crawlers, most programmers will subconsciously think of Python crawlers. Why is this so? I think there are two reasons:

1. The Python ecosystem is extremely rich , third-party libraries such as Request, Beautiful Soup, Scrapy, PySpider, etc. are really powerful

2. Python syntax is simple and easy to use, and you can write a crawler in minutes (some people complain that Python is slow, but the bottleneck of the crawler and language Not relevant)

A crawler is a program. The purpose of this program is to capture information resources on the World Wide Web. For example, search engines such as Google that you use daily, the search results all rely on the crawler to obtain them regularly

Looking at the above search results, in addition to wiki-related introductions, all crawler-related search results include Python. Previous people said that Python crawlers are true, and now it seems that they are honest~

The target target of crawlers is also It is very rich. Whether it is text, pictures, videos, or any structured and unstructured data crawlers can crawl it. After the development of crawlers, various crawler types have also been derived:

● General web crawlers: crawlers Expanding the retrieval objects from some seed URLs to the entire Web, this is what search engines do

● Vertical web crawler: Crawling topics in specific fields, such as vertical crawlers that specifically crawl novel directories and chapters

● Incremental web crawler: perform real-time updates on crawled web pages

● Deep web crawler: crawl some web pages that require users to submit keywords to obtain

I don’t want to To talk about these general concepts, let us take obtaining web content as an example. Starting from the crawler technology itself, let's talk about web crawlers. The steps are as follows:

Simulate requesting web resources

From HTML Extract target elements

Data persistence

Related recommendations: "Python Tutorial"

The above is the detailed content of What information do python crawlers generally crawl?. For more information, please follow other related articles on the PHP Chinese website!

Python html scrapy 对象搜索引擎

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：How to use c++ code in pythonNext article：How to use c++ code in python

See more

What information do python crawlers generally crawl?

Related articles