Home > Article > Database > Cry..I thought I knew MySQL indexes well

Cry..I thought I knew MySQL indexes well

coldplay.xixiforward: 2020-11-04 17:24:181883browse

mysql video tutorial column introduces the real index.

Related free learning recommendations: mysql video tutorial

1. What is an index?

In a relational database, an index is a separate, physical storage structure that sorts the values of one or more columns in a database table. It is a collection of the values of one or several columns in a table. Collection and corresponding list of logical pointers to the data pages in the table that physically identify these values. The index is equivalent to the table of contents of a book. You can quickly find the required content based on the page numbers in the table of contents.

When there are a large number of records in the table, if you want to query the table, the first way to search for information is to search the whole table, which is to take out all the records one by one, compare them one by one with the query conditions, and then return Records that meet the conditions will consume a lot of database system time and cause a lot of disk I/O operations; the second is to create an index in the table, then find the index value that meets the query conditions in the index, and finally save it in the index The ROWID (equivalent to the page number) in the table can quickly find the corresponding record in the table.

The index data structure used by the InnoDB storage engine after MySQL5.5 mainly uses: B Tree; this article will take you to talk about the past and present life of B Tree;

**Mark**

：

B Tree can be used for , >=, BETWEEN, IN , and LIKE usage indexes that do not start with a wildcard. (After MySQL 5.5)

These facts may subvert some of your perceptions, such as in other articles or books you have read. All of the above are "range queries" and are not indexed!

That’s right, before 5.5, the optimizer would not choose to search through the index. The optimizer believed that the rows retrieved in this way were more than those of the full table scan, because it would have to go back to the table to check again, which may involve The number of I/O rows is larger and is abandoned by the optimizer.

After optimization of the algorithm (B Tree), it supports scanning of some range types (taking advantage of the orderliness of the B Tree data structure). This approach also violates the leftmost prefix principle, resulting in the condition after the range query being unable to use the joint index, which we will explain in detail later.

2. Advantages and Disadvantages of Index

1. Advantages

The index greatly reduces the amount of data that the server needs to scan
The index can help The server avoids sorting and temporary tables
Indexes can turn random I/O into sequential I/O

2. Disadvantages

Although indexes are greatly improved It improves the query speed, but it also reduces the speed of updating the table, such as INSERT, UPDATE and DELETE on the table. Because when updating the table, MySQL not only needs to save the data, but also save the index file.
Creating index files will occupy disk space. Generally speaking, this problem is not serious, but if you create multiple combination indexes on a large table and insert a large amount of data, the index file size will expand rapidly.
If a data column contains a lot of repeated content, indexing it will not have much practical effect.
For very small tables, a simple full table scan is more efficient in most cases;

Therefore, only the most frequently queried and most frequently sorted data columns should be indexed. (The total number of indexes in the same data table in MySQL is limited to 16)

　One of the meanings of the existence of a database is to solve data storage and fast search. So where does the data in the database exist? That's right, it's a disk. What are the advantages of a disk? Cheap! What about the disadvantages? Slower than memory access.

So do you know the data structures mainly used by MySQL indexes?

B tree! you blurt out.

What kind of data structure is the B-tree? Why did MySQL choose B-tree for index?

In fact, the final choice of B-tree went through a long process of evolution:

#Binary sorting tree → Binary balanced tree → B-Tree（B tree） → B Tree（B tree）

A friend asked me “What is the difference between B-tree and B-tree?” ? To generalize here, MySQL data structures only have B-Tree (B tree) and B Tree (B tree). Most of them are just different pronunciations. "B-Tree" is generally called B-tree. You can also call it B-tree~ ~

Also, the red-black tree mentioned by a friend is a storage structure in a programming language, not MySQL; for example, Java's HashMap uses a linked list plus a red-black tree.

Okay, today I will take you through the process of evolving into a B-tree.

3. The past and present life of B Tree index

1. Binary sorting tree

Before understanding the B-tree, let’s briefly talk about the binary sorting tree. For a node, its left child The value of the child node of the tree must be less than itself, and the value of the child node of its right subtree must be greater than itself. If all nodes meet this condition, then it is a binary sorted tree. (Here you can string together the knowledge points of binary search)

The picture above is a binary sorting tree. You can try to use its characteristics to experience the process of finding 9:

9 is smaller than 10, go to its left subtree (node 3) to find
9 is larger than 3, go to the right subtree of node 3 (node 4) to find
9 is larger than 4, go to the right subtree of node 4 (node 9) to find
Node 9 is equal to 9, the search is successful

Compared 4 times in total, then you Have you ever thought about how to optimize the above structure?

2. AVL tree (self-balancing binary search tree)

The above picture is an AVL tree. The number and value of nodes are equal to the binary sorting tree. Exactly the same

Let’s look at the process of finding 9:

9 is larger than 4, go to its right subtree to find
9 is smaller than 10, go Its left subtree searches for
nodes 9 and 9, which are equal, and the search is successful

. It has been compared a total of 3 times, and the same amount of data is once less than the binary sorting tree. Why? ? Because the height of the AVL tree is smaller than that of the binary sorting tree, the higher the height, the more the number of comparisons. Don’t underestimate this optimization. If it is 2 million pieces of data, the number of comparisons will be significantly different.

You can imagine a balanced binary tree with 1 million nodes and a tree height of 20. A query may need to access 20 data blocks. In the era of mechanical hard disks, it took about 10 ms of seek time to randomly read a data block from the disk. In other words, for a table with 1 million rows, if a binary tree is used to store it, it may take 20 10 ms to access a single row. This query is really slow!

3. B-tree (Balanced Tree) Multi-way balanced search tree Multi-fork

B-tree is a multi-way self-balancing search tree. It is similar to an ordinary binary tree, but the B tree allows Each node has more child nodes. The schematic diagram of the B-tree is as follows:

Characteristics of the B-tree:

All key values are distributed throughout the tree
Any keyword appears and only appears in one node
The search may end at a non-leaf node
A search is performed within the full set of keywords, and the performance is close to the binary search algorithm

In order to improve efficiency, the number of disk I/Os should be reduced as much as possible. In the actual process, the disk is not read strictly on demand every time, but is read ahead every time.

After the disk reads the required data, it will read some more data into the memory in order. The theoretical basis for this is the locality principle noted in computer science:

Since disk sequential reading is very efficient (no addressing time required, only little rotation time), for programs with locality, read-ahead can improve I/O efficiency. The length of read-ahead Generally, it is an integral multiple of page.
MySQL (using the InnoDB engine by default) manages records in pages, and the default size of each page is 16K (can be modified).

B-Tree uses the computer disk read-ahead mechanism:

Every time a new node is created, a page of space is applied for, so each time a node is searched, only one I/ O;Because in actual applications, the node depth will be very small, so the search efficiency is very high.

So how is the final version of the B-tree made?

4. B Tree (B tree is a variant of B tree and also a multi-path search tree)

You can also see from the picture , The difference between B-tree and B-tree is:

All keywords are stored in leaf nodes, non-leaf nodes do not store real data, so leaf nodes can be quickly located .
Adds a chain pointer to all leaf nodes, which means that all values are stored in order, and the distance from each leaf page to the root is the same, which is very suitable for searching range data .

** Therefore, B Tree can use indexes for , >=, BETWEEN, IN, and LIKE that does not start with a wildcard character. **

Advantages of B tree:

The number of comparisons is balanced, reducing the number of I/Os, improving the search speed, and making the search more stable.

The disk read and write cost of B tree is lower
The query efficiency of B tree is more stable

What you need to know is that every time you create a table, the system will Automatically create an ID-based clustered index (the above-mentioned B-tree) for you to store all data; every time you add an index, the database will create an additional index for you (the above-mentioned B-tree), and the number of fields selected by the index is each The number of node storage data indexes. Note that this index does not store all data.

4. Why does MySQL choose B-tree instead of B-tree for index?

B tree is more suitable for external storage (usually disk storage). Since internal nodes (non-leaf nodes) do not store data, one node can store more internal nodes, and each node can be indexed The range is larger and more precise. In other words, the amount of information in a single disk I/O using B-tree is larger than that of B-tree, and the I/O efficiency is higher.
Mysql is a relational database, and an index column is often accessed according to the interval. Link pointers are established in order between the leaf nodes of the B-tree, which enhances interval access, so the B-tree has Interval range queries are friendly. The key and data of each node of the B-tree are together, so interval search cannot be performed.

5. Programmers, index knowledge points you should know

1. Table return query

For example, if you create name, age index name_age_index, query the data

select * from table where name ='陈哈哈' and age = 26;
1复制代码

is used because there are only name and age in the additional index. Therefore, after hitting the index, the database must go back to the clustered index to find other data. This is table return. This is also the one you memorize: use less The reason for select *.

2. Index coverage

It will be better understood when combined with the return table. For example, in the above name_age_index index, there is a query

select name， age from table where name ='陈哈哈' and age = 26;
1复制代码

At this time, the selected field name, age is in the index name_age_index can be obtained, so there is no need to return to the table, index coverage is satisfied, and the data in the index is returned directly, which is highly efficient. It is the preferred optimization method for DBA students when optimizing.

3. The leftmost prefix principle

The node storage index order of the B tree is stored from left to right. When matching, it is natural to satisfy the left-to-right matching; usually when we create When jointly indexing, that is, indexing multiple fields. I believe students who have established indexes will find that both Oracle and MySQL will let us choose the order of the index. For example, we want to index the three fields a, b, and c. To create a joint index, we can choose the priority we want, a, b, c, or b, a, c or c, a, b, etc. in order. Why does the database let us choose the order of fields? Aren’t they all joint indexes of three fields? This leads to the principle of the leftmost prefix of database indexes.

In our development, we often encounter the problem that a joint index is built for this field, but the index is not used when SQL queries this field. For example, the index abc_index: (a, b, c) is a joint index of the three fields a, b, c. When the following sql is executed, the index abc_index cannot be hit;

select * from table where c = '1';

select * from table where b ='1' and c ='2';
123复制代码

The following three situations will cause the index to be lost. :

select * from table where a = '1';

select * from table where a = '1' and b = '2';

select * from table where a = '1' and b = '2'  and c='3';
12345复制代码

Do you have any clues from the above two examples?

Yes, the index abc_index: (a,b,c) will only be used in three types of queries: (a), (a,b), and (a,b,c). In fact, there is a bit of ambiguity here. In fact, (a,c) will also be used, but only the a field index will be used, and the c field will not be used.

In addition, there is a special case. In the following type, only a and b will be indexed, and c will not be indexed.

select * from table where a = '1' and b > '2'  and c='3';
1复制代码

For a sql statement of the above type, after a and b are indexed, c is already out of order, so c cannot be indexed, and the optimizer will think that it is not as good as the full index. Table scan c field comes quickly.

**Leftmost prefix: As the name suggests, it means leftmost priority. In the above example, we created a_b_c multi-column index, which is equivalent to creating (a) single column index and (a, b) combined index. And (a,b,c) combination index. **

Therefore, when creating a multi-column index, the most frequently used column in the where clause should be placed on the far left according to business needs.

4. Index pushdown optimization

Or index name_age_index, the following sql

select * from table where name like '陈%' and age > 26；
1复制代码

This statement has two execution possibilities:

Hit The name_age_index joint index queries all data that satisfies the name starting with "陈", and then returns to the table to query all satisfying rows.
Hit the name_age_index joint index, query all data whose name starts with "Chen", and then filter out the index with age>20, and then return to the table to query the entire row of data.

Obviously, the second method returns fewer rows to the table and the number of I/Os will also be reduced. This is index pushdown. So not all likes will not hit the index.

6. Precautions when using indexes

1. The index will not include columns with null values

As long as the columns contain null values, they will not be included. In the index, as long as one column in the composite index contains a null value, then this column is invalid for this composite index. Therefore, we recommend not to let the default value of a field be null when designing the database.

2、使用短索引

对串列进行索引，如果可能应该指定一个前缀长度。例如，如果有一个char(255)的列，如果在前10个或20个字符内，多数值是惟一的，那么就不要对整个列进行索引。短索引不仅可以提高查询速度而且可以节省磁盘空间和I/O操作。

3、索引列排序

查询只使用一个索引，因此如果where子句中已经使用了索引的话，那么order by中的列是不会使用索引的。因此数据库默认排序可以符合要求的情况下不要使用排序操作；尽量不要包含多个列的排序，如果需要最好给这些列创建复合索引。

4、like语句操作

一般情况下不推荐使用like操作，如果非使用不可，如何使用也是一个问题。like “%陈%” 不会使用索引而like “陈%”可以使用索引。

5、不要在列上进行运算

这将导致索引失效而进行全表扫描，例如

SELECT * FROM table_name WHERE YEAR(column_name)

6、不使用not in和操作

这不属于支持的范围查询条件，不会使用索引。

我的体会

曾经，我一度以为我很懂MySQL。

刚入职那年，我还是个孩子，记得第一个需求是做个统计接口，查询近两小时每隔5分钟为一时间段的网站访问量，JSONArray中一共返回24个值，当时菜啊，写了个接口循环二十四遍，发送24条SQL去查（捂脸），由于那个接口，被技术经理嘲讽~~表示他写的SQL比我吃的米都多。虽然我们山东人基本不吃米饭，但我还是羞愧不已。。
然后经理通过调用一个dateTime函数分组查询处理一下，就ok了，效率是我的几十倍吧。从那时起，我就定下目标，深入MySQL学习，万一日后有机会嘲讽回去？

筒子们，MySQL路漫漫，其修远兮。永远不要眼高手低，一起加油，希望本文能对你有所帮助。

The above is the detailed content of Cry..I thought I knew MySQL indexes well. For more information, please follow other related articles on the PHP Chinese website!

Java sql mysql NULL select char 循环指针数据结构接口 delete 算法 oracle 数据库 dba

Statement：

This article is reproduced at:juejin.im. If there is any infringement, please contact admin@php.cn delete

Previous article：How to assign null to mysql dateNext article：How to assign null to mysql date

See more

Cry..I thought I knew MySQL indexes well

mysql video tutorial column introduces the real index.

1. What is an index?

2. Advantages and Disadvantages of Index

1. Advantages

2. Disadvantages

3. The past and present life of B Tree index

1. Binary sorting tree

2. AVL tree (self-balancing binary search tree)

3. B-tree (Balanced Tree) Multi-way balanced search tree Multi-fork

4. B Tree (B tree is a variant of B tree and also a multi-path search tree)

4. Why does MySQL choose B-tree instead of B-tree for index?

5. Programmers, index knowledge points you should know

1. Table return query

2. Index coverage

3. The leftmost prefix principle

4. Index pushdown optimization

6. Precautions when using indexes

1. The index will not include columns with null values

2、使用短索引

3、索引列排序

4、like语句操作

5、不要在列上进行运算

6、不使用not in和操作

我的体会

Related articles