This article brings you knowledge about the bottom layer and optimization of indexes inmysql. Let’s sort out the knowledge points about indexes in mysql. I hope it will be helpful to everyone.
Recently, I have read about indexing knowledge on many websites. There are various opinions, but they are not comprehensive. The concept is very vague. The following are the Mysql index knowledge points compiled by the editor.
The index is used to quickly find rows with a specific value in a column. Without using an index, MySQL must start from The first record starts reading the entire table until the relevant rows are found. The larger the table, the more time it takes to query the data. If the column queried in the table has an index, MySQL can quickly reach a location to search for the data. Files instead of having to look at all the data will save a lot of time.
1.A hash table is a structure that stores data using key-value. We only need to enter the key to be found, that is, key, to find its corresponding Value is Value. The idea of hashing is very simple. Put the value in the array, use a hash function to convert the key to a certain position, and then put the value at this position in the array.
Inevitably, multiple key values will have the same value after being converted by the hash function. One way to handle this situation is to pull out a linked list.
2.Speaking of bTree, we have to mention binary trees. Binary trees are divided into many types, such as:binary search tree, balanced binary tree, etc. Of course, there is also the key point红黑tree.
1)The characteristics of a binary search tree are:The values of all nodes in the left subtree of the parent node are less than the value of the parent node. The values of all nodes in the right subtree are greater than the value of the parent node.The following uses a picture as an example to illustrate a binary search tree.
There is a requirement to find Zhang San. If we do not use a binary search tree, we need to search 7 times. Using a binary search tree, we only need to search 4 times to find the value we want.
According to the above, using a binary search tree can indeed reduce the number of queries, but have you ever thought about what if the data in the database is sequentially increasing data such as 1, 2, 3, 4, 5, 6, and 7? If you continue to use the binary search tree, it will become a linked list. So if we want to find 7, we need to search 7 times, and scan the table 7 times. This is no different from not creating an index, which is also one of the disadvantages. The following figure is an example.
2)Balanced Binary Tree: Also known as AVL tree, the absolute value of the height difference between its left and right subtrees does not exceed 1, and The left and right subtrees are both balanced binary trees, and the AVL tree is the earliest self-balancing binary search tree invented. In an AVL tree, the maximum height difference between two subtrees of any node can only be 1, so it is also called a height-balanced tree. Querying, adding and deleting are O(log n) in average and worst case. Additions and deletions may require one or more tree rotations to rebalance the tree.
Our purpose of introducing binary trees is to improve the efficiency of binary tree search, thereby reducing the average search length of the tree. To this end, we must adjust the structure of the tree when inserting a node into each binary tree so that the binary tree search can maintain balance. , thereby potentially reducing the height of the tree and reducing the average tree search length.
The characteristics of a balanced binary tree are as follows:
1. Its left subtree and right subtree are both AVL trees
2. The height difference between the left subtree and the right subtree cannot exceed 1
Example:
3)Red-black tree: It can be understood that the red-black tree is a tree that is above the balanced binary tree. The red-black tree The tree does not pursue "complete balance", it only seeks to partially achieve balance requirements, reducing the requirements for rotation, thereby improving performance. Furthermore, due to its design, any imbalance can be resolved within three revolutions. In red-black trees, its algorithm time complexity is the same as AVL, and the statistical performance will force AVL trees to be higher. Therefore, compared with the balanced binary tree, the red-black tree is not a balanced binary tree in the strict sense. The insertion and deletion efficiency of the red-black tree is higher, and the query efficiency is relatively lower than the balanced binary tree. However, the difference in query efficiency between the two is In comparison, it is basically negligible. The characteristics of red-black trees are as follows:
1.Nodes are red or black.
2.The root node is black.
3.Each red node’s two child nodes are black. (Children of red nodes must be black nodes)
4.All paths from any node to each of its leaves contain the same number of black nodes.
Therefore, the red-black tree is a black-balanced tree, and the height difference between the left subtree and the right subtree will not exceed 2. The parent node and child node of a red node can only be a black node.
Example picture:
4)BTree (B tree): Of course, the above mentioned red-black tree, the performance is very high. Taking the above figure as an example, the maximum height of the tree is 4, with a total of 9 pieces of data. However, for the Mysql database, there are millions of pieces of data, tens of millions of pieces of data, and the height of the tree is immeasurable, for example, hundreds Ten thousand pieces of data require 30-50 disk IO times to query the data, or even more times, which obviously cannot meet the efficient query efficiency of Mysql index. So if we control the height of the tree, this will greatly reduce the number of disk IO requests. If the height is controlled at 4, then only 4 disk IOs are needed to query the data.
But how to control the height of the tree? A red-black tree only stores one element per node. What if each node stores multiple elements? This can solve the height problem. There must be students who have questions. Put all If all the elements are placed on a node, the height value will be 1. Isn’t it faster? It is definitely wrong to think this way.Mysql has a size limit every time it deals with disk IO. Mysql limits the size of each node to 16K.Students who want to check their Mysql limit node size can execute the following sql.
show global status like 'Innodb_page_size'
The following figure is used as an example to illustrate the characteristics of BTree
BTree:
1. All index elements are not repeated
2. Node The data index increases from left to right
3. Leaf nodes have the same depth, and the pointers of leaf nodes are empty
4. Both leaf nodes and non-leaf nodes store indexes and data
5)B tree: As mentioned above, BTree controls the height of the tree, which can meet Mysql's needs for indexing. However, in the end, the Mysq index implementation is not BTree but B Tree, Mysql has made a little modification to the B-tree and obtained the B-tree. It can also be understood that the B-tree is an upgraded version of the B-tree.
Let’s take the picture as an example:
As you can see from this picture, our non-leaf nodes only store indexes and do not store data, and pointers are used between leaf nodes. connected. Both the leaf nodes and non-leaf nodes of the B-tree store indexes and data, and the pointers of the leaf nodes are empty. The B-tree places the data on the leaf nodes, so that the non-leaf nodes can store more indexes, each time More indexes can also be obtained from disk IO.
B tree features are as follows:
1. Non-leaf nodes do not store data, only indexes (redundant) and lower-level pointers, and more indexes can be placed
2 .Leaf nodes contain all index fields and data
3.Leaf nodes are connected with double pointers to improve the performance of interval access
B-tree drawn on Baidu and many blogs It's wrong. You must avoid the pit.
If you are interested in seeing Mysql’s official explanation of B-trees, you can check it out.
Link: Mysql official website.
1. Classification according to the storage association of the index: Divided into two major categories
1.) Clustered index (clustered index): The leaf nodes contain complete data records and do not need to be returned to the table.
2.) Non-clustered index: Need to return the tableand perform a secondary tree search, which affects performance.
1.1)Everyone knows that MySQL has two commonly used storage engines: MyISAM and InnoDB, but have you actually understood the underlying data storage structures of the two storage engines?
Let’s take the picture as an example to explain:
The test.myisam table is the MyISAM storage engine, and the actor table is the InnoDB storage engine. You can see that the MyISAM storage engine has three files, namely frm and MYD. , MYI, it is easy to understand the abbreviation of frm-frame. It stores the structure of the table. MYD-MYData stores the data. MYI-MYIndex stores the index. The index and data are stored separately. InnoDB only has frm and IBD. , where frm also has the same structure of the stored table, and the IBD file stores indexes and data, which is different from InnoDB and MyISAM.
The following figure is used as an example to illustrate that the MyISAM storage engine primary key index requires a table return operation (non-clustered index)
where 15 stores the primary key index, and 0x07 stores the location of 15 The disk file address pointer of the row record. For example, if we want to find the data of 15, we should first find the pointer corresponding to 15 through the primary key index tree, and then find this pointer and then go to the MyD file to find the specific data. Two steps are required. This process is called a table return operation.
2.1)The following figure is used as an example to illustrate that the InnoDB storage engine primary key index does not require table return operations. (Clustered Index)
The InnoDB storage engine sub-node first stores the index in the row 15, and the column below 15 stores all the other fields in the row where the index is located. If we want The data of 15 can be found directly without having to go through a second tree search.
2.Classified by function: Mainly divided into five categories
2.1Primary key index: InnoDB primary key index does not require table return operations
2.2Ordinary index (secondary index): InnoDB ordinary index requires table return operation. For secondary index, it will be a joint index with the primary key by default.
2.3Unique index
2.4Full-text index
2.5Union index: need to satisfy the leftmost prefix principle
3.It was mentioned in 2.2 that ordinary indexes require table return operations. Is there any ordinary index that does not require table return operations? The answer is yes. In a certain query, the index has already covered our query needs. , we call it a covering index. At this time, there is no need to return to the table.
Since covering indexes can reduce the number of tree searches and significantly improve query performance, using covering indexes is a common performanceoptimizationmeans.
For example: The following is the initialization statement of this table.
mysql> create table T ( ID int primary key, k int NOT NULL DEFAULT 0, s varchar(16) NOT NULL DEFAULT '', index k(k)) engine=InnoDB; insert into T values(100,1, 'aa'),(200,2,'bb'), (300,3,'cc'),(500,5,'ee'),(600,6,'ff'),(700,7,'gg');
In the above table T, if I execute select * from T where k between 3 and 5, how many tree search operations need to be performed, and how many rows will be scanned?
Now, let’s take a look at the execution flow of this SQL query statement. Look at the picture below.
1.)Find the record k=3 in the k index tree and obtain ID = 300;
2.)Then go to ID The index tree finds the R3 corresponding to ID=300;
3.)Take the next value k=5 in the k index tree and obtain ID=500;
4.)Go back to the ID index tree and find the R4 corresponding to ID=500;
5.)Get the next value k=6 from the k index tree. If the condition is not met, the loop ends.
In this process,the process of returning to the primary key index tree search is called table return. It can be seen that this query process reads three records of the k index tree (steps 1, 3 and 5) and returns the table twice (steps 2 and 4).
In this example, since the data required for the query results is only on the primary key index, it has to be returned to the table.
If the executed statement is select ID from T where k between 3 and 5,At this time, you only need to check the value of ID, and the value of ID is already in the k index tree, so it can be provided directly Query results do not need to be returned to table. That is to say, in this query, index k has "covered" our query requirements, which is called a covering index.
In InnoDB, tables are stored in index form according to primary key order. Tables stored in this way are called index-organized tables. And because we mentioned earlier, InnoDB uses the B-tree index model, so the data is stored in the B-tree. Each index corresponds to a B-tree in InnoDB.
1.)When only one column in the combined index contains a null value, the index becomes invalid
2.)The index becomes invalid after calculation on the column. All indexes are invalid
3.)Using functions on query conditions will cause index failure
4.)Using != or in where clauses operator, causing index failure
5.)Avoid using or, causing index failure
6.)Using fuzzy query will also cause index failure, you can use like ' a%' instead of like '%a%'
7.)Try to use covering indexes and reduce select * statements
8.)Satisfy the leftmost prefix rule, Starting from the leftmost column and not skipping the columns in the index
9.)If the string is not added with single quotes, the index will be invalid
Create a new employee table with 5 attributes, as follows.
create table employees( id int primary key auto_increment comment '主键自增', name varchar(30) not null default '' comment'名字', age int not null default 1 comment '年龄', id_card varchar(40) not null default '' comment '身份证号', position varchar(40) not null default '' comment '位置' ); -- 创建联合索引 create index name_index on employees (name,age,position); -- 插入一条数据 insert into employees(name,age,id_card,position) values('张三',15, '201124199011035321','北京');
-- 下面以10条sql测试,注意建立的联合索引顺序是 name,age,position 1.explain select * from employees where age=15 and position='北京' and name='张三'; 2.explain select * from employees where name='张三' and age=15 and position='北京'; 3.explain select * from employees where age=15 and name='张三'; 4.explain select * from employees where position='北京' and name='张三'; 5.explain select * from employees where position='北京' and age=15; 6.explain select * from employees where position='北京' and age>15 and name='张三'; 7.explain select * from employees where position='北京'; 8.explain select * from employees where age=15; 9.explain select * from employees where name='张三'; 10.explain select * from employees where name != '张三';
以上10条sql有哪些是索引失效,有哪些是索引没有失效的呢? 相信同学们已经有了答案,但是答案对不对呢,下面我们一起分析下。 首先说第1条,查询条件把3个索引全部用上了,但是索引的顺序有变化,由name,age,position变成 了age,position,name,想到这里肯定有很多同学给出的答案就是索引失效,但是事实证明这个结果 是错的,索引生效,肯定有很多同学疑惑,为什么呢,这条sql不满足最左原则法则呀,这就要涉及到sql 的执行流程了,这里博主简单说下,sql执行有1个优化器的过程,优化器的作用之一就是索引的选择优化, 所以优化器帮我们把索引的顺序变成正确的了,所以索引生效。 下面是第1条按照索引顺序sql和第2条没有按照索引顺序sql的执行结果。 执行结果入下图:可以发现全部生效。
The value of the first sql type is ref, the byte is 288, and the ref has 3 const, all of which are valid.
The value of the second sql type is ref, the byte is 288, and the ref has 3 const, all of which are valid.
想学习sql的执行流程的可以看博主的另一篇关于sql执行流程的文章哦。 有的同学有疑问了,那最左原则没有用了吗? 答案:有用的。
现在我们说下第3、4、5条sql 第3条: explain select * from employees where age=15 and name='张三'; sql在执行的时候,优化器替我们把索引的顺序优化了,由 age -> name 变成 name -> age,这时 索引是生效的。 第4条: explain select * from employees where position='北京' and name='张三'; 索引顺序优化为name - > position,但是这时索引只有name索引生效,position没有生效,因为我 们建立的索引顺序是 name -> age - > position,你会发现跳过了age,索引本质也是一棵树,少 了一个节点,下面的索引当然不会生效了,这就没有满足最左原则法则。 第5条: explain select * from employees where position='北京' and age=15; 这就和第4条sql一样的道理了,第一个索引都不见了,后面的不可能生效。 执行结果如下:
It can be found that the value of the third sql type is ref, the byte is 126 and ref has 2 const, all of which are effective.
The fourth sql only has 122 bytes and the ref has only 1 const, so only the name index takes effect.
The value of the 5th sql type is all, the bytes and ref are empty, and all are invalid.
下面说第6条sql,剩下的sql都是和之前的sql一样的道理。 explain select * from employees where position='北京' and age>15 and name='张三'; 这条sql我们会发现,把索引字段全部使用了并且当作条件查询,不一样的是age是范围查找,优化器替我 们把索引顺序优化成 name -> age - > position ,按照我们索引优化第2条:在列上做计算索引失效,范围之后的索引全部失效,想必答案同学们都知道了。 执行结果如下:
Article 6 SQL only has 126 bytes and the value of type is range. For range search, only name and age indexes take effect.
Recommended learning:mysql video tutorial
ID | name |
---|---|
张五 | |
张六 | |
张七 | |
张二 | |
张一 | |
张四 | |
张三 |
The above is the detailed content of Let's talk about the bottom layer and optimization of Mysql index. For more information, please follow other related articles on the PHP Chinese website!