One of my understanding of MySQL: infrastructure-Mysql Tutorial-php.cn

TodayMySQL Tutorial column will introduce to you the basic architecture that I understand.

One of my understanding of MySQL: infrastructure

As a serious CRUD engineer, interaction with the database plays a large part in daily work, such as adding, deleting, modifying, and processing historical data in daily iterations. , optimize SQL performance, etc. As the amount of project data increases, the deep pits I buried in order to keep up with the project progress are slowly revealing their power. This also forces me to learn MySQL comprehensively and in-depth, instead of just staying on the basic CRUD. .

The first article in the MySQL series mainly introduces the infrastructure of MySQL and the functions of each component, including the bin log of the server layer and the redo log unique to InnoDB.

1. Introduction to MySQL architecture

According to the ranking of the most popular database management systems released by DB-Engines, MySQL is firmly in the second place.

As one of the most popular relational database management systems, MySQL uses a C/S architecture, that is, Client & Server architecture. For example, if a developer uses Navicat to connect to MySQL, then the former is the client and the latter is the server.

At the same time, MySQL is also a single-process, multi-threaded database. This is easy to understand. The running MySQL instance is the "single process", and there will be many threads in this process, such as the main thread Master Thread, IO Thread, etc. These threads are used to handle different tasks.

2. MySQL components

As mentioned earlier, MySQL uses a C/S architecture. Users connect to the MySQL server through the client, and then submit SQL statements to the server, and then the server will The execution results are returned to the client.

In this section, we mainly focus on the logical composition of the MySQL server. Let’s look at a picture first.

As you can see from the above figure, in the interaction with the client, the MySQL server passes through the connector, query cache, analyzer, optimizer, executor and These parts of the storage engine.

The following uses a simple query statement to describe the various components of the MySQL server and their functions.

2.1 Connector

Before the client submits a query statement, it needs to establish a connection with the server. So the first thing to come is the connector. The function of the connector is to be responsible for establishing and managing the connection with the client, and at the same time querying the user's permissions.

It should be noted that:

The connector only obtains the user's permissions and does not perform verification. The verification is performed on the query cache or executor.
Once a connection is established and the user's permissions are obtained, the user permissions will be refreshed only when a new connection is established.
For clients that have not sent requests for a long time, the connector will automatically disconnect. The "long time" here is determined by the wait_timeout parameter, whose default value is 8 hours.

2.2 Query Cache

After establishing a connection through the connector and obtaining user permissions, the user can submit query statements.

The first thing to pass is the query cache part. As you can guess from its name, the function of the query cache is to query whether MySQL has executed the query statement submitted by the client. If this If the SQL has been executed before, and the user has permission to execute the statement on the table, the results of the previous execution will be returned directly.

So at some point, executing a SQL statement multiple times cannot get its average execution time. Because of the query cache, subsequent execution times are often shorter than the first execution time.

If you don’t want to use caching, you can use the update statement to update the table after each query. Of course, this is a very troublesome and silly method. MySQL also provides the corresponding configuration item - query_cache_type. You can set query_cache_type to 0 in the my.cnf file to turn off the query cache.

It should be noted that:

The query cache part is stored in the form of key-value, where key is the query statement and value is the query result.
When the data table is updated, all query caches for this table will become invalid, so generally speaking, the hit rate of the query cache is very low.
In the version MySQL 8.0, the query cache function has been removed.

2.3 Analyzer

The MySQL version I use is 5.7.21, so the query statement submitted by the client will go to the query cache. If there is no hit, it will continue to go down. , come to the analyzer.

The analyzer will perform lexical analysis (parsing the statement) and syntax analysis (determining whether the statement conforms to MySQL's grammatical rules) on the submitted statement, so the role of the analyzer is to parse the SQL statement and check its legality.

It should be noted that:

When MySQL checks the validity of the SQL statement, it will only prompt an error in the first place that does not comply with the MySQL syntax rules, and will not change the SQL statement. All grammatical errors are displayed.

For example:

select * form user_info limit 1;复制代码

Copy after login

There are two errors in the above SQL statement. The first is a misspelling of from, and the second is that the user_info table does not exist. After execution, MySQL will only remind you of an error. The following shows the result information of executing SQL three times.

第一次的执行信息：
1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'form user_info limit 1' at line 1, Time: 0.000000s

修改为from后第二次的执行信息：1146 - Table 'windfall.user_info' doesn't exist, Time: 0.000000s

修改为 user 表后第三次的执行信息：
OK, Time: 0.000000s复制代码

Copy after login

2.4 Optimizer

After verifying the legality of the SQL statement, MySQL already knows what the statement submitted by the user is for, but before it is actually executed, it still needs to go through a very " Metaphysics" optimizer.

The function of the optimizer is to generate the optimal execution plan for the SQL statement.

The reason why the optimizer is said to be "metaphysical" is because in the process of optimizing SQL statements, it may generate execution plans that are unexpected by the user (index selection, multi-table association connection sequence, implicit function conversion, etc.). Of course, the optimizer sometimes "selects the wrong" index, which is related to factors such as data volume and index statistics.

It should be noted that:

If you need to optimize a SQL in the production environment, please try to restore the table locally with the same data volume as the production environment, and then optimize according to the execution plan .
When writing query statements, be sure to consider the leftmost matching principle of the index (the leftmost matching principle will be discussed in the index chapter).

Regarding the workflow of the MySQL optimizer, you can read this blog: This is how the MySQL optimizer originally works

The execution plan of MySQL is also a skill that must be mastered , this blog is very detailed and worth reading: I don’t know how to read the Explain execution plan. I advise you not to write in your resume that you are familiar with SQL optimization

2.5 Executor

The optimizer generates MySQL. I think After the optimal execution plan, we finally come to the executor. The role of the executor is of course to execute SQL statements.

But before execution, permission verification must be done first to verify whether the user has query permissions on the table. Then according to the engine type defined by the table, use the interface provided by the corresponding engine to perform conditional query on the table, and finally return all the data rows of the table that meet the conditions to the client as a result set, so that the execution of the entire SQL is over. .

It should be noted that:

Before the executor executes the SQL statement, a verification will be performed: to determine whether the user has operation permissions on the table.

2.6 Storage Engine

MySQL supports many storage engines, such as: InnoDB, MyISAM, Memory, etc.

2.6.1 InnoDB

InnoDB is the most commonly used MySQL storage engine today and is also the default storage engine after MySQL 5.5.

InnoDB supports transactions, MVCC (Multiple Version Concurrency Control), foreign keys, row-level locks and auto-increment columns. However, InnoDB does not support full-text indexing, and it takes up more data space.

2.6.2 MyISAM

MyISAM is the default storage engine for MySQL 5.1 and earlier, supporting full-text indexing, compression, spatial functions, and table-level locks.

MyISAM's data is stored in a compact format so it takes up less space. It has high insertion and query speeds, but MyISAM does not support transactions and cannot be safely recovered after a crash.

2.6.3 Memory

All data in Memory is saved in memory. Since no disk I/O is required, its speed is an order of magnitude faster than MyISAM and InnoDB. But if the database is shut down or restarted, the Memory engine data will disappear.

Memory supports Hash index, but because it uses table-level locks, the performance of concurrent writes is relatively low.

It is worth mentioning that temporary tables in MySQL are generally saved in Memory tables. If the amount of data in the intermediate table is too large or contains fields of BLOB type or TEXT type, MyISAM tables will be used.

Regarding the storage engine, since I have relatively little contact with it, I will sort it out after reading "MySQL Technology Insider: InnoDB Storage Engine". I will just briefly mention it here.

3. Log module

The execution process mentioned above mainly describes the query statement. If it is an update statement, it also involves the MySQL log module.

The logical query statements and update statements from the client to the executor are the same, except that when reaching the executor layer, the update statement will interact with the MySQL log module. This is The difference between query statements and update statements.

3.1 物理日志 redo log

3.1.1 redo log 中记录的内容

对于 InnoDB 存储引擎来说，它有一个特有的日志模块——物理日志（重做日志）redo log，它是 InnoDB 存储引擎的日志，它所记录的是数据页的物理修改。

举个例子，现在有一张 user 表，有一条主键 id=1，age=18 的数据，然后用户提交了下面这条 SQL，执行器准备执行。

update user set age=age+1 where id=1;复制代码

Copy after login

对于这条 SQL，在 redo log 中记录的内容大致是：将 user 表中主键 id=1 行的 age 字段值修改为19。

3.1.2 WAL

MySQL 的更新持久化逻辑运用到了 WAL(Write-Ahead Logging，写前日志记录) 的思想：先写日志，再写磁盘。

需要注意的是这里的写日志也是写到磁盘中，但由于日志是顺序写入的，所以速度很快。而如果没有 redo log，直接更新磁盘中的数据，那么首先需要找到那条记录，然后再把新的值更新进入，由于查询和读写I/O，就相对会慢一些。

最后，当 InnoDB 引擎空闲的时候，它会去执行 redo log 中的逻辑，将数据持久化到磁盘中。

3.1.3 redo log 日志文件

redo log 日志文件大小是固定的，我把它理解为一个One of my understanding of MySQL: infrastructure，链表的每个节点都可以存放日志，在这个链表中有两个指针：write（黑）和 read（白）。

最开始这两个指针都指向同一个节点，且节点日志元素都为空，表示此时 redo log 为空。当用户开始提交更新语句，write 节点开始往前移动，假设移动到3的位置。而此时的情况就是 redo log 中有1-3这三个日志元素需要被持久化到磁盘中，当 InnoDB 空闲时，read 指针往前移动，就代表着将 redo log 持久化到磁盘。

但这里有一种特殊情况，就是 InnoDB 一直没有空闲，write 指针一直在写入日志，直到它写到5的位置，再往前写又回到了最开始1的位置（也就是上图的位置，但不同的是链表节点中都存在日志数据）。

此时发现1的位置已经有日志数据了，同时 read 指针也在。那么这时候 write 指针就会暂停写入，InnoDB 引擎开始催动 read 指针移动，把 redo log 清空掉一部分之后再让 write 指针写入日志文件。

3.1.4 redo log 的作用

我们已经知道，redo log 中记录的是数据页的物理修改，所以 redo log 能够保证在数据库发生异常重启时，记录尚未写入磁盘，但是在重启后可以通过 redo log 来“redo”，从而不会发生记录丢失的情况，保证了事务的持久性。

这一能力也被称作 crash-safe。

3.2 归档日志 bin log

前面说到 redo log 是 InnoDB 特有的日志，而 bin log 则是属于 MySQL Server 层的日志，在默认的 Statement Level 下它记录的是更新语句的原始逻辑，即 SQL 本身。

另外需要注意的是：

bin log 的日志文件大小并不固定，它是“追加写入”的模式，写完一个文件后会切换到下一个文件写入。
bin log 没有 crash-safe 的能力。
bin log 是在事务最终提交前写入的，而 redo log 是在事务执行中不断写入的。

3.2.1 bin log 的作用

与 redo log 不同的是，bin log 常用于恢复数据，比如说主从复制，从节点根据父节点的 bin log 来进行数据同步，实现主从同步。

3.3 两阶段提交

为了让 redo log 和 bin log 的状态保持一致，MySQL 使用两阶段提交的方式来写入 redo log 日志。

在执行器调用 InnoDB 引擎的接口将写入更新数据时，InnoDB 引擎会将本次更新记录到 redo log 中，同时将 redo log 的状态标记为 prepare，表示可以提交事务。

随后执行器生成本次操作的 bin log 数据，并写入 bin log 的日志文件中。

最后执行器调用 InnoDB 的提交事务接口，存储引擎把刚写入的 redo log 记录状态修改为 commit，本次更新结束。

在这个过程中有三个步骤 add redo log and mark as prepare -> add bin log -> commit，即：

Write the redo log log and mark it as prepare
Write the bin log
Commit the transaction

If in the second step, also That is, the system crashes or restarts before writing the bin log. After startup, since there is no record in the bin log, the records in the redo log will be rolled back to before the execution of this update statement.

If the system crashes or restarts before the third step, that is, before submission, even if there is no commit but the record in the redo log is in prepare status and there is a complete record in the bin log, it will be automatically committed after the restart. It will not be rolled back.

4. Summary

This article mainly introduces the infrastructure of MySQL and the functions of each component. Finally, it introduces the bin log of the MySQL Server layer and the redo log unique to InnoDB.

5. Review the past and learn the new

The following questions are to question the content described in this article and consolidate knowledge. As the saying goes, "Reviewing the past and learning the new can become a teacher."

If the field in the query statement does not exist, the field is ambiguous, or the keyword is spelled incorrectly, which part reports the error?
If the user does not have query permission on the table, which part reports the error?
Why is MySQL’s query cache invalid?
How is a select query statement executed?
What are the commonly used storage engines in MySQL?
What are the logging modules of MySQL? What role do they play?
What should I do if the redo log is full?
How to understand the two-phase submission of redo log?
What is the difference between redo log and bin log?

More related free learning recommendations: mysql tutorial(Video)

The above is the detailed content of One of my understanding of MySQL: infrastructure. For more information, please follow other related articles on the PHP Chinese website!