The order of search engine retrieval: 1. Crawl web pages from the Internet; 2. Establish an index database; 3. Search and sort in the index database; 4. Process and sort the search results.
Search engine retrieval sequence:
Search engines refer to the use of specific computer programs based on certain strategies. A system that collects information on the Internet, organizes and processes the information, and provides retrieval services to users. A search engine is not the real Internet; it actually searches a pre-organized index database of web pages. A search engine in the true sense usually refers to a system that collects tens of millions to billions of web pages on the Internet and indexes every word (i.e., keyword) in it. A full-text search engine that builds indexed databases. Today's search engines have commonly used hyperlink analysis technology. In addition to analyzing the content of the indexed web page itself, it also analyzes and indexes the URL, Anchor, Text, and even the text surrounding the link of all links pointing to the web page. Therefore, sometimes, even if a certain word does not appear in a certain web page A, such as
"information retrieval", but if there is a web page B pointing to this web page A with the link "information retrieval", then the user searches for " Web page A can also be found during "Information Retrieval". Moreover, if there are more "information retrieval" links on web pages pointing to web page A, then web page A will be considered more relevant and ranked higher when users search for "information retrieval".
The principle of search engine can be divided into four steps:crawl web pages from the Internet, build an index database, search and sort in the index database, and process and sort the search results.
(1). Crawl web pages from the Internet: Use a spider system program that can automatically collect web pages from the Internet, automatically access the Internet, and crawl to other web pages along all URLs in any web page, repeating this process , and collect back all the web pages crawled.
(2) Establish an index database: The analysis indexing system program analyzes the collected web pages and extracts relevant web page information (including the URL of the web page, encoding type, keywords contained in the page content, and keyword positions) , generation time, size, link relationship with other web pages, etc.), and perform a large number of complex calculations based on a certain correlation algorithm to obtain the relevance (or importance) of each web page for each keyword in the page content and hyperlinks. ), and then use this relevant information to build a web page index database.
(3) Search and sort in the index database: When the user enters a keyword, the search system program finds all relevant web pages that match the keyword from the web index database. Because the relevance of the relevant web pages for the keyword has already been calculated, you only need to sort them according to the ready-made relevant values. The higher the relevance, the higher the ranking. Finally, the page generation system organizes the link address of the search results and the page content summary and returns it to the user.
(4) Process and sort the search results: All relevant webpages’ relevant information for this keyword is recorded in the index database. You only need to combine the relevant information and webpage levels to form a relevant numerical degree, and then proceed Sorting, the higher the relevance, the higher the ranking. Finally, the page generation system organizes the link address of the search results and the page content summary and returns it to the user.
Related free recommendations:Programming video course
The above is the detailed content of What is the order of search engine retrieval?. For more information, please follow other related articles on the PHP Chinese website!