If the interviewer asks you: From what dimensions will you optimize MySQL performance? How would you answer?
The so-called performance optimization generally targets the optimization of MySQL queries. Since we are optimizing the query, we naturally need to first know what links the query operation goes through, and then think about which links can be optimized.
I use a picture to show the basic steps that the query operation needs to go through.
The following introduces some strategies for MySQL optimization from 5 perspectives.
Processing connections is the first step in the relationship between the MySQL client and the MySQL server. The first If you can't even walk well, let's not talk about the subsequent story.
Since the connection is a matter for both parties, we naturally optimize it from both the server side and the client side.
What the server needs to do is to accept as many client connections as possible. Maybe you have encounterederror 1040: Too many Error in connections
? It's because the server's mind is not broad enough, and the layout is too small!
We can solve the problem of insufficient connections from two aspects:
1. Increase the number of available connections and modify the environment variablemax_connections
, By default, the maximum number of connections on the server is151
mysql> show variables like 'max_connections'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 151 | +-----------------+-------+ 1 row in set (0.01 sec)
2. Release inactive connections in a timely manner. The system default client timeout is 28800 seconds (8 hours). We can adjust this value a little smaller
mysql> show variables like 'wait_timeout'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | wait_timeout | 28800 | +---------------+-------+ 1 row in set (0.01 sec)
MySQL has a lot of configuration parameters, and most parameters provide default values. The default values are carefully designed by the MySQL author and can fully satisfy everyone. Due to the needs of some situations, it is not recommended to modify it rashly without knowing the meaning of the parameters.
What the client can do is to minimize the number of times it establishes a connection with the server. The established connections can be used as much as possible. Use it. Don't create a new connection every time you execute a SQL statement. The resources of both the server and the client will be overwhelmed.
The solution is to useConnection poolto reuse connections.
Common database connection pools includeDBCP
,C3P0
, Alibaba’sDruid
,Hikari
, the first two are used There are very few, and the latter two are currently at their peak.
But it should be noted that the bigger the connection pool, the better. For example, the default maximum connection pool size ofDruid
is 8, and the default maximum connection pool size ofHikari
is 10 , blindly increasing the size of the connection pool may reduce system execution efficiency. Why?
For each connection, the server will create a separate thread to process it. The more connections there are, the more threads the server will create. When the number of threads exceeds the number of CPUs, the CPU must allocate time slices to perform context switching of threads. Frequent context switching will cause a lot of performance overhead.
Hikari officialgives a recommended value formula forPostgreSQL
database connection pool size,Number of CPU cores*2 1
. Assuming that the number of CPU cores of the server is 4, just set the connection pool to 9. This formula is also applicable to other databases to a certain extent, and you can brag about it during interviews.
It is inevitable that some slow queries will appear in the system. These queries Either the amount of data is large, or the query is complex (many associated tables or complex calculations), causing the query to occupy the connection for a long time.
If the effectiveness of this kind of data is not particularly strong (it does not change every moment, such as daily reports), we can put this kind of data into the cache system. During the cache validity period of the data, Get data directly from the cache system, which can reduce the pressure on the database and improve query efficiency.
In the early stages of the project, the database usually runs on a On the server, all read and write requests from users will directly affect this database server. After all, the amount of concurrency that a single server can bear is limited.
To address this problem, we can use multiple database servers at the same time, set one of them as the team leader, called themaster
node, and the remaining nodes as team members, calledslave
. Users write data only to themaster
node, and read requests are distributed to variousslave
nodes. This solution is calledread-write separation. Give the small group composed of the group leader and group members a name,cluster.
Note: Many developers are dissatisfied with the offensive words
master-slave
(because they think it will be associated with racial discrimination, black slaves, etc.), so a movement to change the name was launched.Affected by this, MySQL will gradually stop using terms such as
master
andslave
, and instead usesource
andreplica
Replacement, just understand it when everyone encounters it.
One problem that must be faced when using a cluster is how to maintain data consistency between multiple nodes. After all, the write request is only sent to themaster
node. Only the data of themaster
node is the latest data. How to synchronize the write operation to themaster
node to all nodes? What about theslave
node?
Master-slave replicationTechnology is here! I briefly introduced the binlog log in my previous article, so I moved it directly.
binlog
is the core component that implements the MySQL master-slave replication function.master
The node will record all write operations into the binlog.slave
The node will have a dedicated I/O thread to read the binlog of themaster
node and write The operation is synchronized to the currentslave
node.
This cluster architecture has a very good effect on reducing the pressure on the main database server. However, as business data increases, if the data volume of a certain table If the query performance of a single table increases sharply, the query performance of a single table will drop significantly. This problem cannot be solved even if the separation of reading and writing is done. After all, all nodes store exactly the same data. The query performance of a single table is poor. Naturally, the performance of all nodes is poor. Difference.
At this time, we can disperse the data of a single node to multiple nodes for storage, which issub-database and sub-table.
The meaning of nodes in sub-database and sub-table is relatively broad. If the database is used as a node, it is a sub-database; if the leaflet is The table serves as a node, that is, a sub-table.
Everyone knows that sub-databases and tables are divided into vertical sub-databases, vertical sub-tables, horizontal sub-databases and horizontal sub-tables, but every time they fail to remember these concepts, I will explain them in detail to help. Everyone understands.
Based on the single database, make several vertical cuts and split according to business logic into different databases, this isvertical sub-database.
Vertical table partitioning is in a single table Basically, make a vertical cut (or several cuts) to split multiple words in a table into several small tables. This operation needs to be judged based on the specific business. Usually, frequently used fields (hot fields) are divided into Table, fields that are not used frequently or are not used immediately (cold fields) are divided into one table to improve query speed.
Take the picture above as an example: Usually the product details are relatively long, and when viewing the product list, it is often not necessary to display the product details immediately (usually click the details button will be displayed), but will display the more important information of the product (price, etc.). According to this business logic, we made the original product table into vertical sub-tables.
Save the data of a single table to multiple data tables according to certain rules (called sharding rules in jargon), horizontally Give the data table a knife (or several knifes), and it will behorizontal table.
水平分库就是对单个数据库水平切一刀,往往伴随着水平分表。
水平分,主要是为了解决存储的瓶颈;垂直分,主要是为了减轻并发压力。
通常情况下,用户的请求会直接访问数据库,如果同一时刻在线用户数量非常庞大,极有可能压垮数据库(参考明星出轨或公布恋情时微博的状态)。
这种情况下可以通过使用消息队列降低数据库的压力,不管同时有多少个用户请求,先存入消息队列,然后系统有条不紊地从消息队列中消费请求。
处理完连接、优化完缓存等架构的事情,SQL查询语句来到了解析器和优化器的地盘了。在这一步如果出了任何问题,那就只能是SQL语句的问题了。
只要你的语法不出问题,解析器就不会有问题。此外,为了防止你写的SQL运行效率低,优化器会自动做一些优化,但如果实在是太烂,优化器也救不了你了,只能眼睁睁地看着你的SQL查询沦为慢查询。
慢查询就是执行地很慢的查询(这句话说得跟废话似的。。。),只有知道MySQL中有哪些慢查询我们才能针对性地进行优化。
因为开启慢查询日志是有性能代价的,因此MySQL默认是关闭慢查询日志功能,使用以下命令查看当前慢查询状态
mysql> show variables like 'slow_query%'; +---------------------+--------------------------------------+ | Variable_name | Value | +---------------------+--------------------------------------+ | slow_query_log | OFF | | slow_query_log_file | /var/lib/mysql/9e74f9251f6c-slow.log | +---------------------+--------------------------------------+ 2 rows in set (0.00 sec)
slow_query_log
表示当前慢查询日志是否开启,slow_query_log_file
表示慢查询日志的保存位置。
除了上面两个变量,我们还需要确定“慢”的指标是什么,即执行超过多长时间才算是慢查询,默认是10S
,如果改成0
的话就是记录所有的SQL。
mysql> show variables like '%long_query%'; +-----------------+-----------+ | Variable_name | Value | +-----------------+-----------+ | long_query_time | 10.000000 | +-----------------+-----------+ 1 row in set (0.00 sec)
有两种打开慢日志的方式
1、修改配置文件my.cnf
此种修改方式系统重启后依然有效
# 是否开启慢查询日志 slow_query_log=ON # long_query_time=2 slow_query_log_file=/var/lib/mysql/slow.log
2、动态修改参数(重启后失效)
mysql> set @@global.slow_query_log=1; Query OK, 0 rows affected (0.06 sec) mysql> set @@global.long_query_time=2; Query OK, 0 rows affected (0.00 sec)
MySQL不仅为我们保存了慢日志文件,还为我们提供了慢日志查询的工具mysqldumpslow
,为了演示这个工具,我们先构造一条慢查询:
mysql> SELECT sleep(5);
然后我们查询用时最多的1条慢查询:
[root@iZ2zejfuakcnnq2pgqyzowZ ~]# mysqldumpslow -s t -t 1 -g 'select' /var/lib/mysql/9e74f9251f6c-slow.log Reading mysql slow query log from /var/lib/mysql/9e74f9251f6c-slow.log Count: 1 Time=10.00s (10s) Lock=0.00s (0s) Rows=1.0 (1), root[root]@localhost SELECT sleep(N)
其中,
更多关于mysqldumpslow
的使用方式,可以查阅官方文档,或者执行mysqldumpslow --help
寻求帮助。
我们可以运行show full processlist
查看MySQL中运行的所有线程,查看其状态和运行时间,找到不顺眼的,直接kill。
其中,
使用SHOW STATUS
查看MySQL服务器的运行状态,有session
和global
两种作用域,一般使用like+通配符
进行过滤。
-- 查看select的次数 mysql> SHOW GLOBAL STATUS LIKE 'com_select'; +---------------+--------+ | Variable_name | Value | +---------------+--------+ | Com_select | 168241 | +---------------+--------+ 1 row in set (0.05 sec)
SHOW ENGINE
用来展示存储引擎的当前运行信息,包括事务持有的表锁、行锁信息;事务的锁等待情况;线程信号量等待;文件IO请求;Buffer pool统计信息等等数据。
例如:
SHOW ENGINE INNODB STATUS;
上面这条语句可以展示innodb存储引擎的当前运行的各种信息,大家可以据此找到MySQL当前的问题,限于篇幅不在此意义说明其中信息的含义,大家只要知道MySQL提供了这样一个监控工具就行了,等到需要的时候再来用就好。
通过慢查询日志我们可以知道哪些SQL语句执行慢了,可是为什么慢?慢在哪里呢?
MySQL提供了一个执行计划的查询命令EXPLAIN
,通过此命令我们可以查看SQL执行的计划,所谓执行计划就是:优化器会不会优化我们自己书写的SQL语句(比如外连接改内连接查询,子查询优化为连接查询...)、优化器针对此条SQL的执行对哪些索引进行了成本估算,并最终决定采用哪个索引(或者最终选择不用索引,而是全表扫描)、优化器对单表执行的策略是什么,等等等等。
EXPLAIN在MySQL5.6.3之后也可以针对UPDATE、DELETE和INSERT语句进行分析,但是通常情况下我们还是用在SELECT查询上。
这篇文章主要是从宏观上多个角度介绍MySQL的优化策略,因此这里不详细说明EXPLAIN
的细节,之后单独成篇。
SQL优化指的是SQL本身语法没有问题,但是有实现相同目的的更好的写法。比如:
针对最后一条举个简单的例子,下面两条语句能实现同样的目的,但是第二条的执行效率比第一条执行效率要高得多(存储引擎使用的是InnoDB),大家感受一下:
-- 1. 大偏移量的查询 mysql> SELECT * FROM user_innodb LIMIT 9000000,10; Empty set (8.18 sec) -- 2.先过滤ID(因为ID使用的是索引),再limit mysql> SELECT * FROM user_innodb WHERE id > 9000000 LIMIT 10; Empty set (0.02 sec)
为慢查询创建适当的索引是个非常常见并且非常有效的方法,但是索引是否会被高效使用又是另一门学问了。
推荐阅读:《如何用好MySQL索引?你必须了解这些事!》,感兴趣的读者可以看一下。
//m.sbmmt.com/mysql-tutorials-493147.html
一般情况下,我们会选择MySQL默认的存储引擎存储引擎InnoDB
,但是当对数据库性能要求精益求精的时候,存储引擎的选择也成为一个关键的影响因素。
建议根据不同的业务选择不同的存储引擎,例如:
MyISAM
;Memory
;InnoDB
;字段优化的最终原则是:使用可以正确存储数据的最小的数据类型。
MySQL提供了6种整数类型,分别是
不同的存储类型的最大存储范围不同,占用的存储的空间自然也不同。
例如,是否被删除的标识,建议选用tinyint
,而不是bigint
。
你是不是直接把所有字符串的字段都设置为varchar
格式了?甚至怕不够,还会直接设置成varchar(1024)
的长度?
如果不确定字段的长度,肯定是要选择varchar
,但是varchar
需要额外的空间来记录该字段目前占用的长度;因此如果字段的长度是固定的,尽量选用char
,这会给你节约不少的内存空间。
Try to set non-null fields toNOT NULL
, and provide a default value, or use a special value instead ofNULL
.
BecauseNULL
type storage and optimization will have poor performance problems, the specific reasons will not be discussed here.
This is also a principle mentioned in the "Alibaba Development Manual". There are three reasons:
Reduces readability, and you have to check the database code while checking the code;
Hands over the calculation work For the program, the database only does the storage work and does this well;
The work of data integrity verification should be completed by the developer rather than relying on external Key, once you use foreign keys, you will find that it becomes extremely difficult to delete some junk data during testing.
Do not store large files directly, but store the access address of large files.
Large field splittingIn fact, it is the vertical table partitioning mentioned before. Split infrequently used fields or fields with large amounts of data to avoid too many columns and too large amounts of data, especially if you are used to writingSELECT *
. The problem will be seriously magnified!
Field redundancyIn principle, it does not conform to the database design paradigm, but it is very conducive to fast retrieval. For example, when the customer ID is stored in the contract table, the customer name can be stored redundantly, so that there is no need to obtain the user name based on the customer ID when querying. Therefore, it is also a better optimization technique to make a certain degree of redundancy for business logic.
Strictly speaking, business optimization is no longer a means of MySQL tuning, but business optimization can very effectively alleviate the problem. Database access pressure. A typical example in this regard is Taobao. Here are a few simple examples to give you some ideas:
In the past, shopping started on the night of Double 11. Recently, In the past few years, the pre-sale front for Double 11 has become longer and longer, starting more than half a month in advance, and various deposit red envelope models have emerged in endlessly. This method is calledPre-sale diversion. This can divert customer service requests, and you don’t have to wait until the early morning of Double Eleven to place orders collectively;
In the early morning of Double Eleven, you may want to check orders outside of that day, but But the query failed; even the chicken rations in Alipay were delayed. This is adowngrade strategy, which gathers computing resources for unimportant services to ensure the current core business;
During Double Eleven, Alipay strongly recommends using Huabei for payment instead of bank card payment. Although part of the consideration is to improve the stickiness of the software, on the other hand, using Yu'e Bao actually uses Alipay The internal server has fast access speed, but using a bank card requires calling the bank interface, which is much slower in comparison.
The summary of MySQL optimization ends here. There are many details not mentioned, which makes me feel that this article is not perfect. However, there are too many knowledge points to cover in detail. It is impossible to write them all down at once. Let’s write them down later.
[Related recommendations:mysql video tutorial]