Now, people put more and more content on the Internet. It is estimated that there are trillions of independent web pages on the Internet. So, how to get the required information from these massive contents? People invented the Internet search engine to solve this problem. We know that when users enter keywords in search engines such as Baidu, Google or Bing, they will find links to web pages containing keywords and present them to users in a certain order. So, how do search engines help us search for information on the Internet?
Generally speaking, the work of search engine is roughly divided into three parts. The first part is called information capture. Search engines use programs called "web crawlers" to grab all links on Web pages. Due to the characteristics of the Internet, most web pages can be accessed through links to other pages. In theory, from a limited number of web pages, web crawlers can access the vast majority of web pages. Imagine that we can think of the Internet as a huge spider web. The intersection is a web page, and the spider silk between the intersections is a link. Reptiles can start from an intersection and reach any intersection along the spider silk.
After finding the web page, the search engine will start its second part: indexing. In short, the search engine extracts keywords from the web page and saves the page information and even the content of the whole page in its own database according to certain rules. The purpose of this is to make the information can be found as soon as possible. If the search engine simply stores the page irregularly, it will lose the significance of the search engine if it has to traverse all the saved information every time.
For example, if a search engine wants to index a page that introduces the cartoon journey to the west, words such as "Monkey King", "journey to the west", "Tang Monk" and "Wu Chengen" will generally become part of the page index. It is worth mentioning that due to the particularity of Chinese (English takes words as the unit, words are separated by spaces, and Chinese takes words as the unit, and there is no obvious separation between words), the page is generally subject to word segmentation before extracting keywords.
After completing the first two parts, the search engine can provide search services to users. Get the user's search results and show them to the search engine. For example, when we search for "Monkey King", because the page features of the cartoon journey to the West have been stored in the database when establishing the index, we can return the link of the page to the user through the "Monkey King" index. In addition, the returned results will also include other results, such as the page of the comic book journey to the west, the page of the book journey to the west, etc.