Home>Article>Database> Introducing the MySQL large table optimization solution

Introducing the MySQL large table optimization solution

coldplay.xixi forward: 2021-01-28 09:28:00 1665browse

Free learning recommendation:mysql database(Video )

Background

Alibaba Cloud RDS FOR MySQL (MySQL version 5.7) database business table adds more than 10 million new data every month. As the amount of data continues to increase, slow queries on large tables appear in our business. During peak business periods, slow queries on the main business table take dozens of seconds, seriously affecting the business

Program Overview

Introducing the MySQL large table optimization solution

1. Database design and index optimization

The MySQL database itself is highly flexible, resulting in insufficient performance and heavy reliance on the developer's table design capabilities and indexing Optimization capabilities, here are some optimization suggestions

Convert the time type to timestamp format, store it in int type, build an index to increase query efficiency
It is recommended that the field definition is not null, null Values are difficult to query and optimize and occupy additional index space
Use TINYINT type instead of enumeration ENUM
To store precise floating point numbers, DECIMAL must be used instead of FLOAT and DOUBLE
The field length is serious According to business needs, do not set it too large
Try not to use the TEXT type. If you must use it, it is recommended to split the infrequently used large fields into other tables
MySQL has restrictions on the length of index fields. Yes, the length of each index column of the innodb engine is limited to 767 bytes by default, and the sum of the lengths of all index columns cannot be greater than 3072 bytes (mysql8.0 single index can create 1024 characters)
大If the table has DDL requirements, please contact the DBA

Leftmost index matching rule

As the name suggests, it means leftmost priority. When creating a combined index, it should be used in the where clause according to business needs. The most frequent column is placed on the far left. A very important issue in a compound index is how to arrange the order of columns. For example, if the two fields c1 and c2 are used after where, then the order of the index is (c1, c2) or (c2, c1). The correct approach is to repeat The smaller the value, the higher it is placed. For example, if 95% of the values in a column are not repeated, then this column can generally be placed at the front.

Compound index index(a,b,c)
where a=3 only uses a
where a=3 and b=5 uses a,b
where a=3 and b=5 and c=4 uses a, b,c
where b=3 or where c=4 No index is used
where a=3 and c=4 Only a
where a=3 and b> 10 and c=7 uses a,b
where a=3 and b like 'xx%' and c=7 uses a,b
which is actually equivalent to creating multiple indexes: key (a), key(a,b), key(a,b,c)

2. Switch the database to PloarDB read-write separation

PolarDB It is a next-generation relational cloud database self-developed by Alibaba Cloud. It is 100% compatible with MySQL. The storage capacity can reach up to 100 TB. A single database can be expanded to up to 16 nodes. It is suitable for diversified database application scenarios of enterprises. PolarDB adopts an architecture that separates storage and computing. All computing nodes share a copy of data and provides minute-level configuration upgrades and downgrades, second-level fault recovery, global data consistency, and free data backup and disaster recovery services.

Cluster architecture, separation of computing and storage
PolarDB adopts a multi-node cluster architecture. There is a Writer node (master node) and multiple Reader nodes (read-only nodes) in the cluster. Each node Sharing the underlying storage (PolarStore) through the distributed file system (PolarFileSystem)
Read-write separation
When the application uses the cluster address, PolarDB provides external services through the internal proxy layer (Proxy), and the application All requests go through the proxy first and then access the database node. The proxy layer can not only perform security authentication and protection, but also parse SQL, send write operations (such as transactions, UPDATE, INSERT, DELETE, DDL, etc.) to the master node, and evenly distribute read operations (such as SELECT) to multiple nodes. Read nodes realize automatic read and write separation. For applications, it's as simple as using a single point of database.

In offline mixed scenarios: different services use different connection addresses and use different data nodes to avoid mutual influence

Introducing the MySQL large table optimization solution

Sysbench performance stress test Report:

PloarDB 4-core 16G 2 units

Introducing the MySQL large table optimization solution

##3. Migrate historical data of sub-tables to MySQL8.0 X-Engine storage engine

The split business table retains 3 months of data (this is based on the company's needs). Historical data is split into historical database X-Engine storage engine tables on a monthly basis. Why should we choose X-Engine storage engine tables? What are its advantages? ?

Saving costs, the storage cost of X-Engine is about half that of InnoDB

X-Engine tiered storage improves QPS, adopts a hierarchical storage structure, and combines hot data with Cold data is stored in different levels, and the level where the cold data is located is compressed by default.

X-Engine is an online transaction processing (OLTP) self-developed by Alibaba Cloud Database Product Division. Processing) database storage engine.
The X-Engine storage engine is not only seamlessly compatible with MySQL (thanks to the MySQL Pluginable Storage Engine feature), but X-Engine also uses a layered storage architecture. Because the goal is to store large-scale massive data, provide high concurrent transaction processing capabilities and reduce storage costs, in most large data volume scenarios, the opportunities for data to be accessed are uneven, and hot data that is frequently accessed actually accounts for Very rarely, X-Engine divides the data into multiple levels according to the frequency of data access. According to the access characteristics of each level of data, it designs the corresponding storage structure and writes it to the appropriate storage device

X-Engine uses LSM-Tree as the architectural basis for hierarchical storage and has been redesigned:

The hot data layer and data updates use memory storage, through in-memory database technology (Lock-Free index structure/ append only) improves transaction processing performance.

The pipeline transaction processing mechanism parallels several stages of transaction processing, greatly improving throughput.

Data with low access frequency is gradually eliminated or merged into the persistent storage layer, and combined with multi-layer storage devices (NVM/SSD/HDD) for storage.

A lot of optimizations have been made to the Compaction process that has a large impact on performance:

Split the data storage granularity, use the characteristics of relatively concentrated data update hotspots, and reuse data as much as possible in the merge process .

Finely control the shape of LSM, reduce I/O and computational costs, and effectively alleviate the space increase during the merger process.

Also use more fine-grained access control and caching mechanisms to optimize read performance.

4. Parallel query of Alibaba Cloud PloarDB MySQL8.0 version
After splitting the tables, our data volume is still very large Large, it does not completely solve our slow query problem, but only reduces the size of our business tables. For these slow queries, we need to use PolarDB’s parallel query optimization
PolarDB MySQL 8.0 launches the parallel query framework , when the amount of your query data reaches a certain threshold, the parallel query framework will be automatically started, thereby exponentially reducing the query time.
Split the data into different threads at the storage layer, and multiple threads will perform parallel calculations. The results of the pipeline are summarized into the main thread, and finally the main thread does a simple merge and returns it to the user to improve query efficiency.
Parallel Query utilizes the parallel processing capabilities of multi-core CPUs. Taking the 8-core 32 GB configuration as an example, the schematic diagram is as follows.
Parallel queries are suitable for most SELECT statements, such as large table queries, multi-table join queries, and queries with large calculation loads. For very short queries, the effect is less noticeable.
Parallel query usage, you can use Hint syntax to control a single statement. For example, when the system turns off parallel queries by default, but you need to speed up a high-frequency slow SQL query, you can use Hint to Specific SQL is accelerated.
SELECT /PARALLEL(x)/ … FROM …; – x >0
SELECT /* SET_VAR(max_parallel_degree=n) */ * FROM … // n > 0
Query test: The database is configured with 16 cores and 32G. The data volume of a single table exceeds 30 million
It was 4326ms before parallel query was added, and it was 525ms after adding it, and the performance was improved by 8.24 times.
##5. Interactive analysis Hologre
Although we use parallelism for slow queries on large tables Query optimization has improved efficiency, but we still cannot achieve some specific requirements for real-time reports and real-time large screens, and can only rely on big data for processing.
Here we recommend Alibaba Cloud’s interactive analysis Hologre (
https://help.aliyun.com/product/113622.html)
6. Postscript
Optimization of tens of millions of large tables is based on business scenarios and at the cost of cost. It is not possible to horizontally split and expand the database right from the start, which will bring problems to operation and maintenance and business. A huge challenge. In many cases, the results may not be good. Whether our database design, index optimization, and table partitioning strategies are in place, we should choose the appropriate technology to implement them based on business needs.

More related free learning recommendations:mysql tutorial(Video)

The above is the detailed content of Introducing the MySQL large table optimization solution. For more information, please follow other related articles on the PHP Chinese website!

Statement：

This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete

Previous article：How to convert null data in mysql Next article：How to convert null data in mysql

See more

Introducing the MySQL large table optimization solution

Leftmost index matching rule

Related articles