Detailed explanation of MySQL8.0 InnoDB parallel execution-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Detailed explanation of MySQL8.0 InnoDB parallel execution

藏色散人

Mar 27, 2020 am 08:58 AM

Overview

After years of development, MySQL has become the most popular database, widely used in the Internet industry, and gradually penetrating into various traditional industries. The reason for its popularity is, on the one hand, its excellent high-concurrency transaction processing capabilities, and on the other hand, it also benefits from the rich ecosystem of MySQL. MySQL works well in processing short queries in OLTP scenarios, but its ability to handle complex large queries is limited. The most direct point is that for a SQL statement, MySQL can only use one CPU core to process it. In this scenario, it cannot use the multi-core capabilities of the host CPU. MySQL has not stood still and has been developing. The newly launched version 8.0.14 introduces the parallel query feature for the first time, which doubles the performance of check table and select count(*) type statements. Although the current usage scenarios are relatively limited, the subsequent development is worth looking forward to.

Recommendation: "mysql video tutorial"

Usage method

Set the number of concurrent threads by configuring the parameter innodb_parallel_read_threads. Can start the parallel scan function, the default value is 4. I will do a simple experiment here, import 200 million pieces of data through sysbench, configure innodb_parallel_read_threads to 1, 2, 4, 8, 16, 32, 64 respectively to test the effect of parallel execution. The test statement is select count(*) from sbtest1;

Detailed explanation of MySQL8.0 InnoDB parallel execution

The horizontal axis is the number of configured concurrent threads, and the vertical axis is the statement execution time. Judging from the test results, the overall parallel performance is still good. Scanning 200 million records dropped from 18s for a single thread to 1s for 32 threads. No matter how much concurrency is developed in the future, due to the limited amount of data, the management consumption of multi-threads exceeds the performance improvement brought by concurrency, and the SQL execution time cannot be continued to be shortened.

MySQL Parallel Execution

In fact, the current parallel execution of MySQL is still in a very early stage, as shown in the figure below. The left side is the previous MySQL serial processing of a single SQL form; The middle one is the parallel capability provided by the current MySQL version, the form of parallel scanning of the InnoDB engine; the far right one is the form that MySQL will develop in the future. The optimizer generates a parallel plan based on the system load and SQL, and sends the partition plan to the executor for parallelization. implement. Parallel execution is not just parallel scanning, but also includes parallel aggregation, parallel joining, parallel grouping, and parallel sorting. There are no supporting modifications to the upper-level optimizer and executor of the current version of MySQL. Therefore, the following discussion mainly focuses on how the InnoDB engine implements parallel scanning, mainly including partitioning, parallel scanning, read-ahead, and adapter classes that interact with the executor.

Detailed explanation of MySQL8.0 InnoDB parallel execution

Partitioning

One of the core steps of parallel scanning is partitioning, which divides the scanned data into multiple parts so that multiple threads can Parallel scan. The InnoDB engine is an index-organized table. Data is stored on the disk in the form of a B tree. The unit of a node is a page (block/page). At the same time, hot pages are cached in the buffer pool and eliminated through the LRU algorithm. The logic of partitioning is to start from the root node page and scan down layer by layer. When it is judged that the number of branches on a certain layer exceeds the configured number of threads, the splitting will stop. During implementation, a total of two partitions will actually be performed. The first partition is divided according to the number of branches of the root node page. The record of the leftmost leaf node of each branch is the left lower bound, and this record is recorded as the adjacent upper bound. The upper right bound of a branch. In this way, B tree is divided into several subtrees, and each subtree is a scan partition. After the first partition, there may be a problem that the number of partitions cannot fully utilize the multi-core. For example, if the parallel scanning thread is configured as 3, and after the first partition, 4 partitions are generated, then after the first 3 partitions are completed in parallel, the fourth Each partition can only be scanned by one thread at most, and the final effect is that multi-core resources cannot be fully utilized.

Secondary partitioning

In order to solve this problem, version 8.0.17 introduced secondary partitioning. For the fourth partition, continue to explore the split, so many Sub-partitions can be scanned concurrently, and the minimum granularity of concurrent scanning by the InnoDB engine is the page level. The specific logic for judging secondary partitioning is that after one partitioning, if the number of partitions is greater than the number of threads, the partitions whose number is greater than the number of threads need to continue to be partitioned for the second time; if the number of partitions is less than the number of threads and the B tree level is very deep, then all All partitions require secondary partitioning.

The relevant code is as follows:

split_point = 0;
if (ranges.size() > max_threads()) {
   //最后一批分区进行二次分区                                      
   split_point = (ranges.size() / max_threads()) * max_threads();          
 } else if (m_depth < SPLIT_THRESHOLD) {                                  
   /* If the tree is not very deep then don&#39;t split. For smaller tables    
   it is more expensive to split because we end up traversing more blocks*/
   split_point = max_threads();                                            
 } else {
   //如果B+tree的层次很深(层数大于或等于3，数据量很大)，则所有分区都需要进行二次分区
 }

Whether it is a primary partition or a secondary partition, the logic of the partition boundary is the same. The record of the leftmost leaf node of each partition is the lower left boundary, and Record this record as the upper right boundary of the adjacent previous branch. This ensures that there are enough partitions, fine enough granularity, and sufficient parallelism. The figure below shows the configuration of 3 concurrent threads scanning for secondary partitioning.

The relevant code is as follows:

Detailed explanation of MySQL8.0 InnoDB parallel execution

create_ranges(size_t depth, size_t level)
一次分区：
parallel_check_table
 add_scan
   partition(scan_range, level=0)  /* start at root-page */
     create_ranges(scan_range, depth=0, level=0)
   create_contexts(range, index >= split_point)
二次分区：                                                      
split()
 partition(scan_range, level=1)
   create_ranges(depth=0,level)

Parallel scanning

After a partition, put each partition scanning task into a lock-free queue. The parallel worker thread obtains the task from the queue and executes the scanning task. If the obtained task has the split attribute, at this time the worker The task will be split twice and put into the queue. This process mainly includes two core interfaces, one is the worker thread interface, and the other is the traversal record interface. The former obtains tasks from the queue and executes them, and maintains statistical counts; the latter obtains appropriate records based on visibility and injects them through the upper layer Callback function processing, such as counting, etc.

Parallel_reader::worker(size_t thread_id)

{

1. Extract ctx task from ctx-queue

2. According to the split attribute of ctx, Determine whether the partition needs to be further split (split())

3. Traverse all records in the partition (traverse())

4. After a partition task is completed, maintain the m_n_completed count

5. If the m_n_compeleted count reaches the ctx number, wake up all worker threads and end

6. Return err information according to the traverse interface.

}

Parallel_reader::Ctx::traverse()

{

1. Set pcursor according to range

2. Find btree, position the cursor to the starting position of the range

3. Determine visibility (check_visibility)

4. If visible, calculate according to the callback function (such as statistics)

5. Traverse backwards. If the last record of the page is reached, start the read-ahead mechanism (submit_read_ahead)

6. End after exceeding the range

}

At the same time in 8.0 Version .17 also introduces a read-ahead mechanism to avoid the problem of poor parallel performance due to IO bottlenecks. Currently, the number of threads for pre-reading cannot be configured and is hard-coded to 2 threads in the code. The unit of each pre-read is a cluster (InnoDB files are managed through a three-level structure of segments, clusters, and pages, and a cluster is a group of consecutive pages). Depending on the size of the page configuration, it may be 1M or 2M. For a common 16k page configuration, 1M is pre-read each time, which is 64 pages. When the worker thread scans, it will first determine whether the next adjacent page is the first page of the cluster. If so, it will initiate a pre-read task. Read-ahead tasks are also cached through the lock-free queue. The worker thread is the producer and the read-ahead-worker is the consumer. Since all partition pages do not overlap, read-ahead tasks are not repeated.

Executor interaction (adapter)

In fact, MySQL has encapsulated an adapter class Parallel_reader_adapter for use by the upper layer to prepare for subsequent richer parallel execution. . First of all, this class needs to solve the problem of record format and convert the records scanned by the engine layer into MySQL format. In this way, the upper and lower layers are decoupled. The executor does not need to sense the engine layer format and is processed in the MySQL format. The whole process is an assembly line. MySQL records are stored in batches through a buffer. The worker thread continuously reads the records from the engine layer. At the same time, records are continuously processed by the upper layer. The difference in reading and processing speed can be balanced through the buffer. Make sure the whole process flows. The default cache size is 2M. The number of MySQL records that the buffer can cache is determined based on the record row length of the table. The core process is mainly in the process_rows interface. The process is as follows

process_rows

{

1. Convert engine records into MySQL records

2. Get this thread Buffer information (how many mysql records were converted and how many were sent to the upper layer)

3. Fill the MySQL records into the buffer and increment the statistics m_n_read

4. Call the callback function to process (such as statistics , aggregation, sorting, etc.), auto-increment statistics m_n_send

}

For the caller, it is necessary to set the meta-information of the table and inject the processing record callback function, such as processing aggregation, sorting, Group work. The callback function is controlled by setting m_init_fn, m_load_fn and m_end_fn.

Summary

MySQL8.0 introduced parallel query. Although it is still relatively rudimentary, it has already allowed us to see the potential of MySQL parallel query. We have also seen it from the experiment. After parallel execution is turned on, SQL statement execution fully utilizes the multi-core capabilities, and the response time drops sharply. I believe that in the near future, 8.0 will support more parallel operators, including parallel aggregation, parallel connection, parallel grouping, and parallel sorting.

The above is the detailed content of Detailed explanation of MySQL8.0 InnoDB parallel execution. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:cnblogs. If there is any infringement, please contact admin@php.cn delete

MySQL's Role: Databases in Web ApplicationsApr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

MySQL: Building Your First DatabaseApr 17, 2025 am 12:22 AM

The steps to build a MySQL database include: 1. Create a database and table, 2. Insert data, and 3. Conduct queries. First, use the CREATEDATABASE and CREATETABLE statements to create the database and table, then use the INSERTINTO statement to insert the data, and finally use the SELECT statement to query the data.

MySQL: A Beginner-Friendly Approach to Data StorageApr 17, 2025 am 12:21 AM

MySQL is suitable for beginners because it is easy to use and powerful. 1.MySQL is a relational database, and uses SQL for CRUD operations. 2. It is simple to install and requires the root user password to be configured. 3. Use INSERT, UPDATE, DELETE, and SELECT to perform data operations. 4. ORDERBY, WHERE and JOIN can be used for complex queries. 5. Debugging requires checking the syntax and use EXPLAIN to analyze the query. 6. Optimization suggestions include using indexes, choosing the right data type and good programming habits.

Is MySQL Beginner-Friendly? Assessing the Learning CurveApr 17, 2025 am 12:19 AM

MySQL is suitable for beginners because: 1) easy to install and configure, 2) rich learning resources, 3) intuitive SQL syntax, 4) powerful tool support. Nevertheless, beginners need to overcome challenges such as database design, query optimization, security management, and data backup.

Is SQL a Programming Language? Clarifying the TerminologyApr 17, 2025 am 12:17 AM

Yes,SQLisaprogramminglanguagespecializedfordatamanagement.1)It'sdeclarative,focusingonwhattoachieveratherthanhow.2)SQLisessentialforquerying,inserting,updating,anddeletingdatainrelationaldatabases.3)Whileuser-friendly,itrequiresoptimizationtoavoidper

Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Apr 16, 2025 am 12:20 AM

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL: Database Management System vs. Programming LanguageApr 16, 2025 am 12:19 AM

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL: Managing Data with SQL CommandsApr 16, 2025 am 12:19 AM

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Zend Studio 13.0.1

Powerful PHP integrated development environment

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Hot Topics

Where is the login entrance for gmail email?

7542

CakePHP Tutorial

1381

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers