I believe everyone will talk about indexes when optimizing the database, and I am no exception, everyone I can basically answer a few questions about the optimization of data structures, as well as a few words about page caching, but once an interviewer at Alibaba P9 asked me: Can you talk about an index data from the computer level? What is the loading process? (Just wanted me to talk about IO)
I died on the spot.... Because the basic knowledge of computer networks and operating systems is really my blind spot, but I made up for it later, so I won’t talk nonsense. , let’s start with the computer loading data, and talk about indexing from another angle.
MySQL's index is essentially a data structure
Let us first understand the data loading of the computer.
Let’s talk about disk IO first. Reading data from disk relies on mechanical movement, and each time reading data requiresSeek, find point, copy to memoryThree-step operation.
SeekThe time is the time required for the magnetic arm to move to the specified track, usually less than 5ms;
Search pointis from the track The average time to find the point where the data exists is half a turn. If it is a 7200 rpm disk, the average time to find the point is 600000/7200/2=4.17ms;
Copy to memoryThe time is very fast, which is negligible compared with the previous two times, so the average time of oneIO is about 9ms. It sounds fast, but it takes 9000 seconds to go through millions of data in the database, which is obviously a disaster level.
Considering that disk IO is a very expensive operation, the computer operating system has optimized read-ahead. When an IO is performed, not only The data of the current disk address, but theadjacent dataare also read into the memory buffer, because when the computer accesses the data of an address, the adjacent data will also be very fast. was visited.
We call the data read by IO each time a page. The specific size of data on a page depends on the operating system. It is usually 4k or 8k, which means we read the data in one page. At that time, only one IO actually occurred.
(Suddenly thought of a question I was asked just after graduation. In a 64-bit operating system, how many bytes does the int type in Java occupy? What is the maximum? Why?)
Then if we want to optimize database queries, we mustreduce disk IO operations as much as possible, so indexes appear.
MySQL
The official definition of index is: Index (Index) is a data structure that helpsMySQL
obtain data efficiently.
MySQL
The commonly used indexes are physically divided into two categories, B-tree indexes and hash indexes.
This time we mainly talk aboutBTree
index.
BTree
It is also called a multi-path balanced search tree. The characteristics of an m-fork BTree are as follows:
This is a BTree structure diagram with 3 forks (just an example, there will be many forks in reality). Each square block is called a disk block. Or called a block, this is what the operating system reads into the memory in one IO. One block corresponds to four sectors. Purple represents the data key in the disk block, yellow represents the data, and blue represents the Pointer p points to the location of the next disk block.
To simulate the process of finding data with key 29:
1. Read the root disk block 1 of the file directory according to the root node pointer. [Disk IO operation1 time]
2. Disk block 1 stores 17, 35 and three pointer data. We find 17<29<35, so we find pointer p2.
3. According to the p2 pointer, we locate and read disk block 3. [Disk IO operations2 times]
4. Disk block 3 stores 26, 30 and three pointer data. We find 26<29<30, so we find pointer p2.
5. According to the p2 pointer, we locate and read disk block 8. [Disk IO operations3 times]
6, disk block 8 stores 28, 29. We find 29 and get the data corresponding to 29.
It can be seen that the BTree index makes the data fetched from the memory play a role in each disk I/O, thus improving the query efficiency.
But is there anything that can be optimized?
We can see from the figure that each node contains not only the key value of the data, but also the data value. The storage space of each page is limited. If the data data is large, the number of keys that can be stored in each node (i.e. one page) will be very small. When the amount of stored data is large, it will also lead to B- The depth of Tree is larger, which increases the number of disk I/Os during query, thereby affecting query efficiency.
B Tree
is an optimization based onB-Tree
, making it more suitable for implementing external storage index structures . In B Tree, all data record nodes are stored on leaf nodes of the same layer in order of key value. Only key value information is stored on non-leaf nodes. This can greatly increase the number of key values stored in each node. Reduce the height of B Tree.
B Tree has several differences compared to B-Tree:
Non-leaf nodes only store key value information, data records are stored in leaf nodes. Optimize the B-Tree in the previous section. Since the non-leaf nodes of B Tree only store key value information, the height of B Tree can be compressed to a particularly low level.
The specific data is as follows:
The page size in the InnoDB storage engine is 16KB. The primary key type of the general table is INT (occupies 4 bytes) or BIGINT (occupies 8 bytes). Bytes), the pointer type is generally 4 or 8 bytes, which means that one page (a node in B Tree) stores approximately 16KB/(8B 8B)=1K key values (because it is an estimate, it is For convenience of calculation, the value of K here is 〖10〗^3).
That is to say, a B Tree index with a depth of 3 can maintain 10^3 * 10^3 * 10^3 = 1 billion records. (There are errors in this calculation method, and the leaf nodes are not calculated. If the leaf nodes are calculated, the depth is actually 4)
We only need to perform three IO operations to obtain data from 1 billion pieces of data. To find the data we want, we don’t know how many times better it is than the initial million data of 9,000 seconds.
And there are usually two head pointers on B Tree, one points to the root node, the other points to the leaf node with the smallest key, and there is a chain ring structure between all leaf nodes (i.e. data nodes) . Therefore, in addition to performing primary key range search and paging search on B Tree, we can also perform random searches starting from the root node.
The B Tree index in the database can be divided into clustered index (clustered index) and auxiliary index (secondary index).
The implementation of the above B Tree example diagram in the database is a clustered index. The leaf nodes in the B Tree of the clustered index store the row record data of the entire table. The difference between the auxiliary index and the clustered index is The leaf nodes of the auxiliary index do not contain all the data of the row record, but the clustered index key that stores the corresponding row data, that is, the primary key.
When querying data through the auxiliary index, the InnoDB storage engine will traverse the auxiliary index to find the primary key, and then find the complete row record data in the clustered index through the primary key.
However, although indexes can speed up queries and improve MySQL's processing performance, excessive use of indexes will also cause the followingdisadvantages:
Note: Indexes can speed up queries in some cases, but in some cases, they will reduce efficiency.
Index is only one factor to improve efficiency, so the following principles should be followed when establishing an index:
Now everyone knows why the index can be so fast. In fact, it is just one sentence. The index structure can minimize the number of IO times in the database. After all, the time of one IO is really too long. . . .
As far as interviews are concerned, we can actually master a lot of knowledge easily, but if it is for the purpose of learning, you will find that there are many things that we need to go deep into the basics of computers to discover them. Mystery, many people ask me how I remember so many things. In fact, learning itself is a very helpless thing. Since we have to learn, why not learn hard? To learn to enjoy it? Recently, I have also been studying the basics, and I will start to update my computer basics and network-related knowledge later.
More related free learning recommendations:mysql tutorial(Video)
The above is the detailed content of Why MySQL index improves query efficiency. For more information, please follow other related articles on the PHP Chinese website!