Home>Article>Database> MYSQL_Introduction to multi-version concurrency control, storage engine, and indexing

MYSQL_Introduction to multi-version concurrency control, storage engine, and indexing

php是最好的语言 Original: 2018-08-02 14:18:19 1759browse

Multi-version concurrency control

Most transactional storage engines of mysql do not implement simple row-level locks. Based on the consideration of improving concurrency performance, they generally implement multi-version concurrency control at the same time.

It can be considered that MVCC is a variant of row-level locking, but it avoids locking operations in many cases because the overhead is lower.

InnoDB's MVCC is implemented through two hidden columns saved at the end of each row record. Of these two columns, one saves the creation time of the row, and the other saves the expiration time (or deletion time) of the row. ), of course, what is stored is not the actual time value, but the system version. Every time a new transaction is started, the system version number is automatically incremented. The system version number at the start of the transaction will be used as the version number of the transaction, which is used to compare the version numbers of each row queried.

Under the REPEATABLE READ isolation level, the implementation of MVCC:

SELECT

InnoDB’s search version is earlier than The data row of the current transaction version number. This ensures that the rows read by the transaction either already exist before the transaction starts, or have been inserted or modified by the transaction itself.
The deleted version of the row is either undefined or greater than the current transaction version number. This ensures that the row read by the transaction has not been deleted before the transaction starts.

INSERT

InnoDB saves the current system version number as the row version number for each newly inserted row.

DELETE

InnoDB saves the current system version number as the row deletion identifier for each deleted row.

UPDATE

InnoDB inserts a new record of a flight and saves the current system version number as the row version number. At the same time Save the current system version number to the original line as the line delete version number.

MVCC only works at two isolation levels: REPEATABLE READ and READ COMMITED. The other two isolation levels are incompatible with MVCC. Because READ UNCOMMITED always reads the latest data row, not the data row that conforms to the current transaction version. SERIALIZABLE will lock all rows of data read.

Storage Engine

InnoDB Storage Engine

InnoDB is the default transactional engine of MYSQL, and it is also the most important and most used. Extensive storage engine. Unless there is a very special reason to use another storage engine, the InnoDB engine should be given priority.

InnoDB uses MVCC to support high concurrency and implements four standard isolation levels. The default level is REPEATABLE READ (repeatable read), and the gap lock MVCC policy prevents the implementation of phantom reads. The gap lock allows InnoDB to not only lock the rows of the query design, but also lock the gaps in the index to prevent phantom rows. insert.

Gap lock: When we use range conditions instead of equality conditions to retrieve data and request shared or exclusive locks, InnoDB will lock the index entries of existing data records that meet the conditions; for key values in the condition Records that are within the range but do not exist are called "gaps". InnoDB will also lock this "gap". This locking mechanism is the so-called gap lock (Next-Key lock).
Reference: Gap Lock (Next-Key Lock)

The main index is a clustered index, which saves data in the index to avoid direct reading from the disk, thus greatly improving query performance.

InnoDB has made many internal optimizations, including predictable read-ahead when reading data from disk, an adaptive hash index that can automatically create a hash index in memory to speed up operations, and an ability to speed up insertions. Operations insert buffer etc.

MyISAM Storage Engine

In mysql5.1 and previous versions, MyISAM is the default storage engine. MyISAM provides a large number of features, including full-text indexing, compression, spatial functions, etc., but it does not support transactions and row-level locks, and there is an undoubted flaw that it cannot be safely recovered after a crash.

For read-only data, or the table is relatively small and can tolerate repair operations, the MyISAM engine can still be used.

When creating a MyISAM table, if the DELAY_KEY_WRITE option is specified, when each modification is completed, the modified index data will not be written to disk immediately, but will be written to the key buffer in memory. The corresponding index block is written to disk only when the key buffer is cleared or the table is closed. This method can greatly improve write performance, but it will cause index damage when the database or host crashes, requiring repair operations.

Comparison

Transaction: InnoDB supports transactions, MyISAM does not support transactions.
Lock granularity: InnoDB supports table-level locks and row-level locks, while MyISAM only supports table-level locks.
Foreign keys: InnoDB supports foreign keys.
Backup: InnoDB supports hot backup, but tools are required.
Crash recovery: The probability of damage after MyISAM crashes is much higher than that of InnoDB, and the recovery speed is also slower.
Other features: MyISAM supports full-text indexing, compression, spatial functions and other features.

Backup type

Cold backup: MySQL service needs to be turned off, and read and write requests are not allowed;
Warm backup: The service is online, but only supports read requests and does not allow write requests;
Hot backup : While backing up, the business will not be affected.

Index

Index (also called "key") is used by the storage engine to quickly find records A data structure in .

B-Tree index

Most mysql engines support this index.

Although the term "B-Tree" is used, different storage engines may use different storage structures. The NDB cluster storage engine actually uses T-Tree internally, while InnoDB uses B Tree.

B-Tree index can speed up access to data because the storage engine does not need to perform a full table scan to obtain the required data. Instead, it searches from the root node of the index, so the search speed will be much faster.

B-Tree organizes and stores index columns sequentially, which is very suitable for searching range data. Because the index tree is ordered, in addition to user search, it can also be used for sorting and grouping.

You can specify multiple columns as index columns, and multiple index columns together form the index key. B-Tree index is suitable for full key value, key value range or key prefix search, where key prefix search is only applicable to search based on the leftmost prefix. The search must start from the leftmost column of the index.

Data structure of B-Tree index

B-Tree

In order to describe B-Tree, first define a piece of data The record is a tuple [key, data]. Key is the key value of the record. For different data records, the key is different from each other. Data is the data in the data record except key.

All nodes have the same depth, which means that the B-Tree is balanced.
The keys in a node are arranged non-decreasingly from left to right.
If the left and right adjacent keys of a pointer are keyi and keyi 1 respectively, and are not null, then all the keys of the node pointed to by the pointer are greater than or equal to keyi and less than or equal to keyi 1.

Search algorithm: First perform a binary search on the root node. If found, return the data of the corresponding node. Otherwise, search recursively at the node pointed to by the pointer in the corresponding interval.

Since inserting and deleting new data records will destroy the properties of the B-Tree, when inserting and deleting, the tree needs to be split, merged, rotated, etc. to maintain the properties of the B-Tree.

MYSQL_Introduction to multi-version concurrency control, storage engine, and indexing

B Tree

Compared with B-Tree, B Tree has the following characteristics:

The upper limit of the pointer of each node is 2d instead of 2d 1 (d is the degree of B-Tree).
Internal nodes do not store data, only keys; external nodes do not store pointers.

MYSQL_Introduction to multi-version concurrency control, storage engine, and indexing

B Tree with sequential access pointers

is generally used in database systems or file systems The B Tree structures are optimized on the basis of the classic B Tree, adding sequential access pointers.

MYSQL_Introduction to multi-version concurrency control, storage engine, and indexing

The purpose of this optimization is to provide performance for interval access. For example, in the figure, if you want to query all records with keys 18 to 49.

Advantages

Balanced trees such as red-black trees can also be used to implement indexes, but file systems and database systems generally use B-Tree as the index structure. The main ones are as follows Two reasons:

Better retrieval times: The time complexity of retrieving data in a balanced tree is equal to the tree height h, and the tree height is roughly O(h) = O(logN), where d is the out-degree of each node. The out-degree of the red-black tree is 2, while the out-degree of the B-Tree is generally very large. The tree height h of the red-black tree is obviously much higher than that of the B-Tree, so the number of searches is more. B-Tree is more suitable for external memory indexes than B-Tree because the data field is removed from the nodes in B-Tree, so it can have a larger out-degree and the retrieval efficiency will be higher.
Utilize computer read-ahead features: In order to reduce disk I/O, disks are often not read strictly on demand, but are read ahead every time. The theoretical basis for this is the well-known locality principle in computer science: when a piece of data is used, nearby data is usually used immediately. During the read-ahead process, the disk reads sequentially. Sequential reading does not require disk seeking and only requires a short rotation time, so the speed is very fast. The operating system generally divides the memory and disk into solid-sized blocks, each block is called a page, and the memory and disk exchange data in units of pages. The database system sets the size of a node in the index to the size of the page, so that a node can be completely loaded in one I/O, and adjacent nodes can also be preloaded using the read-ahead feature.

Reference: Data structure and algorithm principles behind MySQL index

Hash index

The InnoDB engine has a special function called " "Adaptive hash index", when an index value is used very frequently, a hash index will be created on top of the B Tree index, so that the B Tree index has some advantages of the hash index, such as fast hashing Hope to find.

The hash index can be searched in O(1) time, but it loses order. It has the following limitations:

The hash index only contains hashes The value follows the row pointer and does not store the field value, so you cannot use the value in the index to avoid deleting all rows.
cannot be used for sorting and grouping.
Only supports exact search and cannot be used for partial search and range search.
When a hash conflict occurs, the storage engine must traverse all row pointers in the linked list.

Spatial Data Index (R-Tree)

MyISAM table supports spatial index and can be used as geographic data storage. Spatial indexes index data from all dimensions, and queries can be combined based on any dimension.

Must use Mysql's GIS-related functions such as MBRONTAINS() to maintain data.

Full-text index

Full-text index is a special type of index that looks for keywords in the text instead of directly comparing values in the index. The search condition uses MATCH AGAINST instead of plain WHERE.

Full-text index is generally implemented using an inverted sorted index, which records the mapping of the document where the keyword expires.

MyISAM storage engine supports full-text indexing, and the InnoDB storage engine also supports full-text indexing in Mysql 5.6.4 version.

Advantages of index

It greatly reduces the number of data rows that the server needs to scan.
Helps the server avoid sorting and creating temporary tables (B Tree indexes are ordered and can be used for Order by and group by operations).
Change random I/O into sequential I/O (B Tree index is ordered, that is, adjacent data is stored together).

MySQL database InnoDB storage engine multi-version control (MVCC) implementation principle analysis

Introduction to MySQL storage engine

The above is the detailed content of MYSQL_Introduction to multi-version concurrency control, storage engine, and indexing. For more information, please follow other related articles on the PHP Chinese website!

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：MyBatis reverse engineering analysis and simple tutorial (with code) Next article：MyBatis reverse engineering analysis and simple tutorial (with code)

See more

MYSQL_Introduction to multi-version concurrency control, storage engine, and indexing

Multi-version concurrency control

Storage Engine

InnoDB Storage Engine

MyISAM Storage Engine

Comparison

B-Tree index

Data structure of B-Tree index

B-Tree

B Tree

B Tree with sequential access pointers

Advantages

Hash index

Spatial Data Index (R-Tree)

Full-text index

Advantages of index

Related articles