Home >Database >Mysql Tutorial >How to solve MySQL index failure
When indexing a SQL statement, you will encounter an index failure, which has a crucial impact on the feasibility and performance efficiency of the statement. This article analyzes itWhy the index fails
, what situations will lead to index failure
and the optimization solution
when the index fails, focusing on the leftmost prefix matching principle
, MySQL logical architecture and optimizer
, Index failure scenarios and why they fail
.
I have previously written an article about the characteristics and optimization issues of adding indexes to MySQL. The following will introduce the relevant content of index failure.
First introduce a principle that will be used in subsequent index failure reasons: The leftmost prefix matching principle
.
The underlying principle of the leftmost prefix: When MySQL builds a joint index, it will comply with the leftmost prefix matching principle, that is, leftmost priority. When retrieving data, matching starts from the leftmost of the joint index.
What is the leftmost prefix matching principle? In order to understand the leftmost matching principle of the joint index, let's first understand the underlying principle of the index: the bottom layer of the index is a B-tree, then the bottom layer of the joint index is also a B-tree, but in the B-tree nodes of the joint index Key values are stored. The database needs to be built relying on the leftmost field in the joint index, because the B-tree can only determine the index relationship based on one value.
Example: Create a joint index of (a, b), then its index tree will look like the figure below. The values of
a are in order, and the order of appearance is 1, 1, 2, 2, 3, 3. The values of b are unordered, and the numbers that appear are 1, 2, 1, 4, 1, 2. When the values of a are equal, we can observe that the values of b are arranged in a certain order, but it should be noted that this order is relative. This is because MySQL's rule for creating a joint index is to first sort the leftmost field of the joint index, based on the sorting of the first field, and then sort the second field. Therefore, there is no way to use the index for query conditions such as b=2.
Since the whole process is based on explain result analysis, then let’s learn about the type field and key_lef field in explain.
1.type: connection type.
system: The table has only one row of records (equal to the system table). This is a special case of the const type. It does not usually appear and can be ignored.
const: Indicates that it is found through the index once. Const is used to compare the primary key or unique index. Because you only need to match one row of data, it's very fast. Put the primary key in the WHERE condition and MySQL will convert the query into a const query.
eq_ref: Unique index scan, for each index key, only one record in the table matches it. Commonly seen in primary key or unique index scans. Note: ALL full table scan of the table with the fewest records, such as t1 table ref
: non-unique index scan, returns all rows matching a single value. Essentially, it is an index access that returns all rows that match a single value. However, it may find multiple matching rows, so it should be a mixture of search and scan.
range
: Retrieve only rows in a given range, using an index to select rows. The key column shows which index is used. Generally, queries such as bettween, , in, etc. appear in the where statement. This range scan on index columns is better than a full index scan. It only needs to start at a certain point and end at another point, without scanning the entire index.
index
: Full Index Scan, the difference between index and ALL is that the index type only traverses the index tree. This is usually ALL blocks, as index files are usually smaller than data files. (Although Index and ALL both read the entire table, index is read from the index, while ALL is read from the hard disk)
ALL: Full Table Scan, traverse the entire table to Find the matching row
2.key_len: Displays the length of the index that MySQL actually decided to use. If the index is NULL, the length is NULL. If not NULL, the length of the index used. So this field can be used to infer which index is used.
Calculation rules:
1. Fixed-length field, int occupies 4 bytes, date occupies 3 bytes, char(n ) takes up n characters.
2. The variable-length field varchar(n) occupies n characters and two bytes.
3. Different character sets, the number of bytes occupied by a character is different. In Latin1 encoding, one character occupies one byte, in gdk encoding, one character occupies two bytes, and in UTF-8 encoding, one character occupies three bytes.
(Since my database uses the Latin1 encoding format, in subsequent calculations, one character is counted as one byte)
4. For all index fields, if set to NULL, 1 byte is required.
After understanding the leftmost prefix matching principle, let’s take a look at the index failure scenario and analyze why it fails.
MySQL logical architecture
:
mysql architecture can be divided into There are roughly 4 layers, namely:
1.Client: Various languages provide methods to connect to the mysql database, such as jdbc, php , go, etc., you can choose the corresponding method or framework to connect to mysql according to the selected back-end development language
2.server layer: including connectors, query cache, Analyzers, optimizers, executors, etc. cover most of the core service functions of MySQL, as well as all built-in functions (such as date, family, mathematical and encryption functions, etc.). All cross-storage engine functions are implemented in this layer. Such as stored procedures, triggers, views, etc.
3.Storage engine layer: Responsible for the storage and retrieval of data, it is the component that actually deals with the underlying physical files. The essence of data is stored on the disk. The data is stored in an organized manner through a specific storage engine and extracted according to business needs. The architectural model of the storage engine is plug-in and supports multiple storage engines such as Innodb, MyIASM, and Memory. The most commonly used storage engine now is Innodb, which has become the default storage engine since mysql5.5.5.
4.Physical file layer: Stores the real table data, logs, etc. of the database. Physical files include: redolog, undolog, binlog, errorlog, querylog, slowlog, data, index, etc.
Introduction to important components of the server layer:
1. Connector
The connector is responsible for coming from Client connection, obtaining user permissions, maintaining and managing connections.
After a user successfully establishes a connection, even if you use the administrator account to modify the user's permissions, it will not affect the permissions of the existing connection. After the modification is completed, only new connections will use the new permission settings.
2. Query cache
After mysql gets a query request, it will first go to the query cache to check whether this statement has been executed before. Previously run statements and their output may be stored directly in memory, cached as key-value pairs. The key is the query statement, and the value is the query result. When the keyword of the SQL query can be directly matched in the query cache, the query result (value) will be returned directly to the client.
In fact, it is recommended not to use query cache in most cases. Why? Because query caching often does more harm than good. As long as a table update operation is involved, all query caches related to the table can easily become invalid and cleared. Therefore, it is very likely that after painstakingly storing the results, they will be cleared by a new update operation before they can be used. For databases with many update operations, the hit rate of the query cache will be very low. Unless the business needs a static table, it will only be updated once a long time. For example, if it is a system configuration table, then the query of this table is suitable for using query cache.
3. Analyzer
Lexical analysis (identify keywords, operations, table names, column names)
Syntactic analysis (determine whether it conforms to grammar)
4. Optimizer
The optimizer decides which index to use when there are multiple indexes in the table; or when there are multiple table associations (joins) in one statement time, determine the connection order of each table. After the optimizer phase is completed, the execution plan of this statement is determined, and then enters the executor phase.
5. When the executor
starts execution, it must first determine whether the user has permission to execute queries on this table T. If not, a no permission error will be returned. If the query cache is hit, permission verification will be done when the query cache returns the results. The query also calls precheck to verify permissions before the optimizer. If you have permission, open the table and continue execution. When a table is opened, the executor will call the interface provided by the engine based on the table's engine definition. In some scenarios, the executor is called once and multiple rows are scanned inside the engine, so the number of rows scanned by the engine and rows_examined are not exactly the same
.
MySQL Optimizer
:
The MySQL optimizer uses cost-based optimization (Cost-based Optimization), takes SQL statements as input, and uses the built-in cost model and The data dictionary information and the statistical information of the storage engine determine which steps are used to implement the query statement, that is, the query plan.
From a high level, the MySQL server is divided into two components: the server layer and the storage engine layer. Among them, the optimizer works at the server layer, located above the storage engine API.
The working process of the optimizer can be divided into four stages semantically:
1.Logical transformation, including negation elimination, equal value transfer and constant transfer, constant expression evaluation, conversion of outer joins to inner joins, subquery conversion, view merging, etc.;
2 .Optimization preparation, such as index ref and range access method analysis, query condition fan-out value (fan out, number of records after filtering) analysis, constant table detection;
3.Based on cost Optimization, including the selection of access methods and connection sequences;
4.Execution plan improvements, such as table condition push-down, access method adjustment, sort avoidance, and index condition push-down.
1.Like indexes starting with the wildcard character % fail.
The above introduces the underlying principle of leftmost prefix matching. We know that the commonly used index data structure is a B-tree, and the index is ordered. If the type of the index key is Int type
, the index is sorted in the following order:
The data is only stored in Leaf nodes
, and are ordered
arranged.
If the type of the index keyword is String type
, the sort order is as follows:
As you can see , the index sorting order is based on the first letter
of the comparison string.
When we perform fuzzy query, if we put % in front, the leftmost n letters will be fuzzy and uncertain. We cannot accurately locate a certain index based on the order of the index. We can only perform full query. Table scan to find data that meets the conditions. (The underlying principle of the leftmost prefix)
The same is true when using joint index
. If the rules of index ordering are violated, the index will also be invalid. Full table
Scan.
Example: There is a combined index in the table example: (A, B, C)
SELECT * FROM example WHERE A=1 and B =1 and C=1; You can
use the index ;
SELECT A FROM example WHERE C =1 and B=1 ORDER BY A; You can
index (covering index is used)
SELECT * FROM example WHERE C =1 and B=1 ORDER BY A; Cannot
Go through the index
Covering index:The index contains all the data that meets the query needs, which is called covering index (Covering Index) )
There are two waysOptimization
:
One isuse covering index
, the second isput % behind
.
2.The field type is a string, and the where is not enclosed in quotation marks.
The fields in the table are of string type and are ordinary indexes of the B-tree. If the query condition passes a number, it will not be indexed.
Example: There is a field in the table example whose pid is a varchar type.
//此时执行语句type为ALL全表查询 explain SELECT * FROM example WHERE pid = 1
//此时执行语句type为ref索引查询 explain SELECT * FROM example WHERE pid = '1'
Why is the first statement not indexed without adding single quotes? This is because when single quotes are not added, the comparison is between strings and numbers. Their types do not match. MySQL will do implicit
type conversion and convert them to floating point numbers before comparison.
3.As long as there are non-index columns before and after OR, the index will fail.
The query condition contains or, which may cause index failure.
Example: There are fields in the table example where pid is of type int and score is of type int.
//此时执行语句type为ref索引查询 explain SELECT * FROM example WHERE pid = 1
//把or条件加没有索引的score,并不会走索引,为ALL全表查询 explain SELECT * FROM example WHERE pid = 1 OR score = 10
For the case where OR is followed by a score without an index, it is assumed that it uses the p_id index, but when it comes to the score query conditions, it still has to scan the entire table, which requires a three-step process. : Full table scan index scan merge.
Mysql has an optimizer. In terms of efficiency and cost, it is reasonable that the index may fail when encountering OR conditions.
Note
: If the columns of the OR condition are indexed, the index may be lost.
4.Joint index (combined index), the condition column when querying is not the first column in the joint index, and the index becomes invalid.
In the joint index, when the query conditions meet the leftmost matching principle, the index will take effect normally.
When we create a joint index, such as (k1,k2,k3), it is equivalent to creating three indexes (k1), (k1,k2) and (k1,k2,k3). This is the leftmost matching principle.
Example: There is a joint index idx_pid_score, with pid first and score second.
//此时执行语句type为ref索引查询,idx_pid_score索引 explain SELECT * FROM example WHERE pid = 1 OR score = 10
//此时执行语句type为ref索引查询,idx_pid_score索引 explain SELECT * FROM example WHERE pid = 1
//此时执行语句type为ALL全表查询 explain SELECT * FROM example WHERE score = 10
The joint index does not satisfy the leftmost principle, and the index will generally fail, but this is also related to the Mysql optimizer.
5.Calculation, function, type conversion (automatic or manual) causes index failure. When using (!= or , not in) on the index field, it may cause index failure.
Birthtime is indexed, but because it uses mysql’s built-in function Date_ADD(), there is no index.
Example: In the table example, the idx_birth_time index is a birthtime field of datetime type
//此时执行语句type为ALL全表查询 explain SELECT * FROM example WHERE Date_ADD(birthtime,INTERVAL 1 DAY) = 6
There are also operations on the index column (such as, -, *, /), and the index becomes invalid.
Example: There is an int type score field index idx_score
//此时执行语句type为ALL全表查询 explain SELECT * FROM example WHERE score-1=5
还有不等于(!= 或者)导致索引失效。
例子:在表example中有int类型的score字段索引idx_score
//此时执行语句type为ALL全表查询 explain SELECT * FROM example WHERE score != 2
//此时执行语句type为ALL全表查询 explain SELECT * FROM example WHERE score <> 3
虽然score 加了索引,但是使用了!= 或者 ,not in这些时,索引如同虚设。
6. is null可以使用索引,is not null无法使用索引。
例子:在表example中有varchar类型的name字段索引idx_name,varchar类型的card字段索引idx_card。
//此时执行语句type为range索引查询 explain SELECT * FROM example WHERE name is not null
//此时执行语句type为ALL全表查询 explain SELECT * FROM example WHERE name is not null OR card is not null
7.左连接查询或者右连接查询查询关联的字段编码格式不一样。
两张表相同字段外连接查询时字段编码格式不同则会不走索引查询。
例子:在表example中有varchar类型的name字段编码是utf8mb4,索引为idx_name
在表example_two中有varchar类型的name字段编码为utf8,索引为idx_name。
//此时执行语句example表会走type为index类型索引,example_two则为ALL全表搜索不走索引 explain SELECT e.name,et.name FROM example e LEFT JOIN example_two et on e.name = et.name
当把两表的字段类型改为一致时:
//此时执行语句example表会走type为index类型索引,example_two会走type为ref类型索引 explain SELECT e.name,et.name FROM example e LEFT JOIN example_two et on e.name = et.name
所以字段类型也会导致索引失效
8.mysql估计使用全表扫描要比使用索引快,则不使用索引。
当表的索引被查询,会使用最好的索引,除非优化器使用全表扫描更有效。优化器优化成全表扫描取决与使用最好索引查出来的数据是否超过表的30%的数据。建议
:不要给’性别’等增加索引。如果某个数据列里包含了均是"0/1"或“Y/N”等值,即包含着许多重复的值,就算为它建立了索引,索引效果不会太好,还可能导致全表扫描。
Mysql出于效率与成本考虑,估算全表扫描与使用索引,哪个执行快,这跟它的优化器有关。
The above is the detailed content of How to solve MySQL index failure. For more information, please follow other related articles on the PHP Chinese website!