A web crawler is a web robot used to automatically browse the World Wide Web.
Introduction to web crawlers
Web crawlers, also known as web spiders and web robots, are programs that automatically crawl information from the World Wide Web according to certain rules. or scripts, and other less commonly used names include ants, autoindexers, emulators, or worms.
Characteristics of web crawlers
A web crawler is a program that automatically extracts web pages. It downloads web pages from the World Wide Web for search engines. It is an important component of search engines. Traditionally The crawler starts from the URL of one or several initial web pages and obtains the URL on the initial web page. During the process of crawling the web page, it continuously extracts new URLs from the current page and puts them into the queue until certain stopping conditions of the system are met.
Types of web crawlers
1. General web crawlers
General web crawlers are also called full-network crawlers. The crawling objects are expanded from some seed URLs to The entire Web mainly collects data for portal site search engines and large Web service providers. This type of web crawler has a huge crawling range and quantity, has high crawling speed and storage space requirements, and relatively low requirements for the order of crawling pages. At the same time, Since there are too many pages to be refreshed, parallel work is usually used, but it takes a long time to refresh the page once.
2. Focused web crawler
Focused web crawler, also known as topic web crawler, refers to a web crawler that selectively crawls those pages related to predefined themes, and a general web crawler In comparison, focused crawlers only need to crawl pages related to the topic, which greatly saves hardware and network resources. The number of saved pages is also small and updates quickly. It can also well meet the needs of some specific groups for information in specific fields.
Application of web crawler
1. Statistical data
The main tool for enriching data when cold data is started. When a new business starts, due to the initial , so there is not much data. At this time, we need to crawl data from other platforms to fill in our business data.
2. Crawler to grab tickets
I believe that every Spring Festival or holiday, everyone has used some software to grab tickets, just to get an air ticket or a train ticket, and this A kind of travel software uses web crawler technology to achieve the purpose of grabbing tickets. Web crawlers like ticket grabbing software will constantly crawl transportation ticketing websites. Once there are tickets, they will click to take pictures and put them on their own. website for sale.
The above is the detailed content of What does web crawler mean?. For more information, please follow other related articles on the PHP Chinese website!