Web crawlers, also known as web spiders and web robots, are more commonly known as web chasers in the FOAF community. They are a program that automatically captures World Wide Web information according to certain rules or Scripts, other less commonly used names include ants, autoindexers, emulators or worms.
Most crawlers follow the process of "send a request - get the page - parse the page - extract and store the content". This is actually It also simulates the process of using a browser to obtain web page information.
To put it simply, a crawler is a detection machine. Its basic operation is to simulate human behavior and go to various websites, click buttons, check data, or memorize the information you see. Like a bug crawling tirelessly around a building.
You can simply imagine: every crawler is your "clone". Just like Sun Wukong plucked out a bunch of hairs and blew out a bunch of monkeys.
The Baidu we use every day actually uses this kind of crawler technology: it releases countless crawlers to various websites every day, grabs their information, and then puts on light makeup and queues up to wait for you to retrieve it.
Related recommendations: "What is a python crawler? Why is python called a crawler?"
The above is the detailed content of What is a crawler?. For more information, please follow other related articles on the PHP Chinese website!