Detailed introduction to MySQL query optimization-Mysql Tutorial-php.cn

1. Introduction

The most important thing about a good web application is excellent access performance. The database MySQL is an integral part of the web application and an important part that determines its performance. Therefore, it is crucial to improve the performance of MySQL.

The improvement of MySQL performance can be divided into three parts, including hardware, network, and software. Among them, hardware and network depend on the company's financial resources and require a lot of money, so I won't go into them here. The software is subdivided into many types. Here we achieve performance improvement through MySQL query optimization.

Recently I read some books about query optimization, and also read some articles written by seniors online.

The following is some summary of query optimization that I compiled and learned from:

2. Interception of SQL statements

1. Comprehensive query log

2. Slow query Log

3. Binary log

4. Process list

SHOW FULL PROCESSLIST;

. .

3. Basic analysis commands for query optimization

1. EXPLAIN {PARTITIONS|EXTENDED}

2. SHOW CREATE TABLE tab;

3. SHOW INDEXS FROM tab;

　4.SHOW TABLE STATUS LIKE 'tab';

　5.SHOW [GLOBAL|SESSION] STATUS LIKE '';

　6.SHOW VARIABLES

　. . . .

PS: I personally feel that all of them are nutritionally devoid of any nutrients. Here’s the real stuff.

4. Several directions for query optimization

1. Try to avoid full-text scanning, add indexes to corresponding fields, and use indexes to query

2. Delete unused or duplicate indexes

　3. Query rewriting, equivalent conversion (predicate, subquery, join query)

　4. Delete content and repeat unnecessary statements, and streamline statements

　5. Integrate repeatedly executed statements

　6. Cache query results

5. Index optimization

　5.1. Index advantages:

　1. Maintain data integrity

　　2. Improve data query performance

　　3. Improve table connection operations (jion)

　　4. Sort query results. If there is no index, the internal file sorting algorithm will be used for sorting, which is slower. 5. Simplify aggregated data operations. 5.2. Disadvantages of indexing. 1. The index needs to occupy a certain amount of space. Storage space

　　2. Data insertion, update, and deletion will be affected by the index, and performance will be reduced. Because the data changes, the index also needs to be updated

　3. Multiple indexes, if the optimizer takes time, the best choice

　5.3. Index selection

　1. When the amount of data is large Use

　 2. When the data is highly repetitive,

will not be used 3. If the query retrieves more than 20% of the data, full-text scanning will be used without indexing

5.4. Detailed study of index

Data query:

InnoDB and MyISAM in MySQL are B-Tree type indexes

B-Tree includes: PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT

　B-Tree type index is not supported (that is, when the field uses the following symbols, the index will not be used):

　　>, <, >=, <=, BETWEEN, !=, < >,like '%**'

　　【Here I will introduce the covering index first】

　　　　　　　　　　　　　　　　　　　　　　　　　　　 Around I will introduce it in a way that I understand. Covering indexes do not really exist like primary key indexes and unique indexes. It is just a definition of certain specific scenarios for index application [another understanding: the queried column is an index column, so the column is covered by the index]. It can break through traditional limitations, use the above operators, and still use indexes for queries.

Because the queried column is an index column, there is no need to read the row, only the column field data needs to be read. [For example, if you are reading a book and need to find a certain content, and that content happens to appear in the table of contents, you don’t need to turn page by page, just locate the page in the table of contents and search]

　 How to activate What about covering indexes? What is a specific scenario?

The index field just appears in the select.

Compound indexes may also have other special scenarios. For example, for a three-column composite index, you only need to have the leftmost column of the composite index appear once in select, where, group by, and order by to activate the use of the covering index.

View:

Extra in EXPLAIN displays Using index, indicating that this statement uses a covering index.

Conclusion:

It is not recommended to use select*from when querying. You should write the fields you need and add corresponding indexes to improve querying. performance.

Actual test results for the above operators: 1. In the form of select*from, where is the primary key and can be used to kill [except like] (use the primary key for query); index cannot be used at all Can.

2. Test in the form of select field a from tab where field a "above operator", the result can still be queried using the index. [Using covering index]

Other index optimization methods:

1. Use index keywords as connection conditions

2. Use compound indexes

3. Index merging or and, will involve The fields to be merged into a composite index

4. Add index to the fields involved in where, and group by

6. Subquery optimization

In from, it is a non-correlated subquery. Subqueries can be pulled up to the parent layer. In multi-table join queries, consider the join cost before selecting.

The query optimizer generally uses nested execution for subqueries, that is, executing the subquery once for each row in the parent query, so that the subquery will be executed many times. This execution method is very inefficient.

Advantages of converting subqueries into join queries:

1. The subquery does not need to be executed many times

2. The optimizer can choose different methods and connection sequences based on the information

　3. The connection conditions and filtering conditions of the subquery become the filtering conditions of the parent query to improve efficiency.

Optimization:

Subquery merging. If there are multiple subqueries, try to merge them as much as possible.

Subquery expansion, that is, pull-up becomes a multi-table query (equivalent changes are guaranteed at all times)

Note:

Subquery expansion can only expand simple queries. If the subquery If the query contains aggregate functions, GROUP BY, and DISTINCT, it cannot be pulled up.

Select * from t1 (select*from tab where id>10) as t2 where t1.age>10 and t2.age<25;

select*from t1,tab as t2 where t1.age>10 and t2.age<25 and t2.id>10;

Specific steps:

1. Merge from and from and modify the corresponding parameters

2 , merge where with where, use and to connect

3. Modify the corresponding predicate (change = in in)

7. Rewrite the equivalent predicate:

1. BETWEEEN AND Rewrite it as >=, <= and so on. Actual measurement: 100,000 pieces of data, time before and after rewriting, 1.45s, 0.06s

　2. In converts multiple or. When the field is an index, both can use the index, or is more efficient than in

3. Name like 'abc%' is rewritten as name>='abc' and name<'abd';

Note: In the million-level data test, the like query before name is not indexed is faster than the latter query; after adding an index to the field, the latter query is a little faster, but there is not much difference, because both methods are used when querying To the index.

　. . . .

8. Condition simplification and optimization

1. Combine where, having (when there are no groupby and aggregate functions), and join-on conditions as much as possible

2. Delete unnecessary parentheses, reduce the or and and tree layers of syntax, and reduce CPU consumption

　3. Constant transfer. a=b and b=2 is converted to a=2 and b=2. Try not to use variables a=b or a=@var

4. Eliminate useless SQL conditions

5. Try not to calculate expressions on the right side of the where equal sign; do not use fields in where Calculate expressions and use functions

　6. Identity transformation and inequality transformation. Example: Testing millions of data a>b and b>10 becomes a>b and a>10 and b>10 with significant optimization

9, external connection optimization

About to convert external connections For inner joins

Advantages:

1. The optimization processor handles outer joins in more steps than inner joins and is time-consuming

2. After the outer joins are eliminated, the optimizer selects multiple tables There are more choices for the connection sequence, you can choose the best

　3. You can use the table with the strictest filtering conditions as the outer surface (the front of the connection sequence is the outer loop layer of the multi-layer loop body),

It can reduce unnecessary I/O overhead and speed up algorithm execution.

The difference between on a.id=b.id and where a.id=b.id, on means the table is connected, and where means data comparison

Note: The premise must be that the result is NULL avoidance (that is, the condition is restricted to no NULL data rows, semantically speaking, it is an inner connection)

Optimization principles:

Streamline queries, eliminate connections, equivalent conversions, and remove redundant table object connections

For example: the primary key/unique key is used as the connection condition, and the intermediate table column is only used as the equivalent condition, the intermediate table connection can be removed

10. Other query optimization

1. The following will be Causes the index query to be abandoned and full-text scanning is used

　1.1. Use the != or <> operator in the where clause. Note: Primary key support. Non-primary keys do not support

1.2. Avoid using or used, so the specific situation should be analyzed on a case-by-case basis.

Similar optimization:

select * from tab name='aa' or name='bb';

Select * from tab name='aa'

　　　union all

　　　select * from tab name='bb';

　　　 Actual measurement:

　　 1. One hundred thousand data test, Without any index, the above query is twice as fast as the query below.

2. In the 300,000 data test, when aa and bb are indexed separately, the following query speed is a little faster than or.

　1.3. Avoid using not in

　　Not in generally cannot use indexes; primary key fields can

　　1.4. Try to avoid using the judgment of null in where

　1.5. like cannot be preceded by a percent sign like '%.com'

Solution:

1. If you must use % prefix and the data length is not large, such as URL, you can flip the data and save it Enter the database and check again. LIKE REVERSE'%.com';

　　1.6. When using an index field as a condition, if it is a compound index, the field name with the leftmost prefix of the index should be used

2. Replace exists with in

Select num from a where num in(select num from b)

Select num from a where exists(select 1 from b where num =a.num)

With one million pieces of data, it takes 6.65s and 4.18s to filter 59417 pieces of data. No other optimizations were done, just replacing exists with in.

　3. The field definition is a string. There are no quotation marks when querying, and no index will be used. Full-text scanning will be performed.

[The following is an excerpt from Luantanqin’s blog post http://www.cnblogs.com/lingiu/p/3414134.html. I have not conducted the corresponding test]

4. Try to use it as much as possible table variables instead of temporary tables

5. Avoid frequently creating and deleting temporary tables to reduce the consumption of system table resources

6. If a temporary table is used, be sure to add it at the end of the stored procedure To delete all temporary tables explicitly, first truncate table, and then drop table, this can avoid long-term locking of system tables

　7. Try to avoid using cursors, because cursors are less efficient. If the cursor operation is If the data exceeds 10,000 rows, then you should consider rewriting

8. Large data volume. If the data volume is too large, you should consider whether the corresponding requirements are reasonable.

　9. Try to avoid large transaction operations and improve system concurrency.

The above is the detailed content of Detailed introduction to MySQL query optimization. For more information, please follow other related articles on the PHP Chinese website!