Shanghai dragon Er worthy of understanding search engine indexing and segmentation technology

search engine with a 5W or 5000 records, one thing that is very easy.


search 1 times per person per day (that is 1 words, assumptions are not repeated)


in the network for program development friends all know, we usually use the database search technology is the user input vocabulary, compared with one or more fields in the database in the same content, operation principle of the search engine is simply this:

when we read, the teacher in the teaching process, often say, please turn to page, see the first few paragraphs, remember? Happy campus life is helpless and the ~_~ get visible before the eyes. The teacher sent to let you turn to page to see the section of this directive, is an index in the operation, the index is the first few pages and a few paragraphs, the two index, even if your book is over 1000 pages, can also go to a specific location that in a short time.


in the search engine, then the gorgeous website, it’s a pile of code stack, with the following code.

then the search engine to search every day compared to 2 billion from the 5 billion page keywords.

understand the importance of index database, analyzes the form under the index:

Chinese: 5 billion ÷ 10W=5W

in the interpretation of what is the index database and index database in the search engine to play what role, we also give an example of the image to refer to:

global Internet users at 2 billion, all websites in the world "that is 5 billion

search engine’s own element index database is a lot of words, Chinese characters about 12W, composed of these Chinese characters words nearly 10W, say that the English, English 26 letters, consisting of words for as 100W, before ordering about index database elements, we go analysis of this data:

user input a word search engine to find the matching content from the database, and then displayed in an orderly arrangement to the user, the search engine is to repeat every day not to mind taking the trouble of these operations. Everything seems very normal, we used the data to analyze the problem of

English: 5 billion ÷ 100W=5000

…. This sounds terrible, can you imagine? Imagine this data is so large, but the normal search time search engine every time is less than one second. Indeed, in this process, according to our traditional full-text search method is not realistic. Look at the picture below, and pay attention to the "index database query".