In the above "Analysis of mysql execution process" we mainly introduced the execution process of sql statements at the server layer
Let's analyze it again Let’s take a look at the execution steps of specific statements at the engine layer. CRUD operations are all related to indexes. Let’s first understand the index
index
The emergence of the index is actually to improve The efficiency of data query is like the table of contents of a book
Data structure
Common data structures include hash tables, ordered arrays and search trees
The hash table is a structure that stores data in a key-value format. We only need to enter the value to be found, which is the key, and we can find the corresponding value, which is the Value. The idea of hashing is very simple. Put the value in the array, use a hash function to convert the key into a position, and then put the value in the corresponding position of the array.
Inevitably, multiple key values pass through When converting hash functions, the same value will appear. One way to handle this situation is to pull out a linked list
This structure of the hash table is suitable for scenarios where there are only equivalent queries
Ordered arrays are in The performance in equality query and range query scenarios is very good
If you only look at query efficiency, ordered arrays are very good. However, it becomes troublesome when you need to update data. If you insert a record in the middle, you must move all the subsequent records. The cost is too high.
Ordered array index is only suitable for static storage engines
The characteristics of the binary search tree are: the left son of each node is smaller than the parent node, and the parent node is smaller than the right son
Of course, in order to maintain O(log(N)) query complexity, you need to keep the tree a balanced binary tree. In order to make this guarantee, the time complexity of the update is also O(log(N))
Binary trees are the most efficient search, but in fact most database storage does not use binary trees. The reason is that the index not only exists in memory, but also is written to disk
In order for a query to read as few disks as possible, the query process must access as few data blocks as possible. Then, we should not use binary trees, but "N-ary" trees. Here, the "N" in the "N-ary" tree depends on the size of the data block
N-ary tree has been widely used in databases due to its performance advantages in reading and writing and adapting to the access mode of the disk. The engine is hit
InnoDB’s index model
In InnoDB, tables are stored in the form of indexes based on primary key order. Tables in this storage method are called Index organized table. InnoDB uses the B-tree index model, so the data is stored in the B-tree
Each index corresponds to a B-tree in InnoDB
According to the content of the leaf node, the index type is divided into For primary key indexes and non-primary key indexes
The leaf nodes of the primary key index store the entire row of data. In InnoDB, the primary key index is also called a clustered index
The content of the leaf nodes of the non-primary key index is the value of the primary key. In InnoDB, non-primary key indexes are also called secondary indexes
Queries based on non-primary key indexes need to scan one more index tree (table return). Therefore, we should try our best to use primary key query in the application
Index maintenance
B tree In order to maintain the order of the index, it is necessary to insert new values Do necessary maintenance
If the newly inserted ID value is smaller than the original one, it will be relatively troublesome. You need to logically move the subsequent data to make room for it
And even worse, If the data page is full, according to the B-tree algorithm, you need to apply for a new data page and then move some data there. This process is called page splitting. In this case, performance naturally suffers.
In addition to performance, page splitting operations also affect data page utilization. The data that was originally placed on one page is now divided into two pages, and the overall space utilization is reduced by about 50%.
Of course there will be divisions and mergers. When two adjacent pages have low utilization due to deleted data, the data pages will be merged. The merging process can be considered as the reverse process of the splitting process
The insertion data mode of the auto-increasing primary key is in line with the incremental insertion scenario we mentioned earlier. Each time a new record is inserted, it is an append operation. It does not involve moving other records, nor does it trigger the split of leaf nodes.
When fields with business logic are used as primary keys, it is often not easy to ensure ordered insertion, so the cost of writing data is relatively high
The smaller the length of the primary key, the smaller the leaf nodes of the ordinary index. The smaller it is, the smaller the space occupied by ordinary indexes
So, from the perspective of performance and storage space, auto-incrementing the primary key is often a more reasonable choice
Is there anything Is the scenario suitable for using business fields directly as primary keys? There are still some. For example, some business scenario requirements are as follows:
1. There is only one index;
2. The index must be a unique index.
This is a typical KV scenario
Covering Index
If the executed statement is select ID from t, then you only need to check the value of ID, and the value of ID is already in the k index tree, so the query results can be provided directly without returning to the table. That is to say, in this query, index k has "covered" our query requirements. We call it a covering index
Because the covering index can reduce the number of tree searches and significantly improve query performance , so using covering index is a common performance optimization method
Index pushdown
When the leftmost prefix principle is satisfied, the leftmost prefix can be used Locate the record in the index. At this time, you may want to ask, what happens to those parts that do not match the leftmost prefix?
The index pushdown optimization introduced in MySQL 5.6 can judge the fields included in the index first during the index traversal process, directly filter out the records that do not meet the conditions, and reduce the number of table returns
Leftmost prefix principle
Not only the entire definition of the index, as long as the leftmost prefix is met, the index can be used to speed up retrieval
When establishing a joint index, How to arrange the order of fields in the index?
Our evaluation criterion here is the reusability of the index. Because the leftmost prefix can be supported, when there is already a joint index of (a, b), there is generally no need to create a separate index on a. Therefore, the first principle is that if one less index can be maintained by adjusting the order, then this order is often the one that needs to be prioritized
Prefix index
Use the most The left prefix principle allows you to define a portion of a string as an index. By default, if the statement you create the index does not specify the prefix length, the index will contain the entire string
However, the penalty this brings is that it may increase the number of additional record scans because the index is the same Need further comparison
Use prefix index and define the length, you can save space without adding too much additional query cost
You can use statistical index How many different values are there to determine how long the prefix should be used, thereby reducing the number of scans
The impact of prefix index on covering index
Use prefix index Without covering index, query performance is optimized, which is also a factor you need to consider when choosing whether to use prefix index
Reverse order storage and hash storage
For For fields like email, using prefix indexes may work well. However, what should we do when we encounter a situation where the prefix distinction is not good enough?
The first way is to use reverse order storage. If you store the ID number, store it upside down
The second way is to use the hash field. You can create another integer field on the table to save the verification code of the ID card, and create an index on this field
Free learning video tutorial recommendation:mysql video tutorial
The above is the detailed content of Detailed explanation of mysql index (summary). For more information, please follow other related articles on the PHP Chinese website!