Home  >  Article  >  Backend Development  >  Is it difficult to learn python crawler?

Is it difficult to learn python crawler?

silencement
silencementOriginal
2019-06-12 15:18:563849browse

Simply speaking, the Internet is a large network composed of sites and network devices. We access the site through a browser, and the site returns HTML, JS, and CSS codes to the browser. These codes are parsed and rendered by the browser, and then Rich and colorful web pages appear before our eyes.

Is it difficult to learn python crawler?

What is a crawler?

If we compare the Internet to a large spider web, data is stored in each node of the spider web, and a crawler is a small spider that crawls its own information along the network. A prey (data) crawler refers to a program that initiates a request to a website, obtains resources, analyzes and extracts useful data; from a technical perspective, it simulates the behavior of a browser requesting a site through a program, and converts the HTML code/JSON data returned by the site. /Binary data (pictures, videos) Climb locally, extract the data you need, and store it for use.

Basic process of crawler

How users obtain network data:

Method 1: Browser submits request--->Download web page code --->Parsed into a page

Method 2: Simulate the browser to send a request (get the web page code)->Extract useful data->Save it in a database or file

Crawler All you have to do is method 2;

Initiate a request

Use the http library to initiate a request to the target site, that is, send a Request

Request Contains: request header, request body, etc.

Request module defect: Unable to execute JS and CSS code

Get response content

If the server can respond normally, you will get a Response

Response includes: html, json, pictures, videos, etc.

Parsing content

Parsing html data: regular expressions (RE module), third-party parsing libraries such as Beautifulsoup, pyquery, etc.

Parse json data: json module

Parse binary data: write to file in wb mode

Save data

Database (MySQL, Mongdb , Redis)

File

The above is the detailed content of Is it difficult to learn python crawler?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn