Home  >  Article  >  Database  >  Detailed introduction to partition tables in MySQL

Detailed introduction to partition tables in MySQL

不言
不言forward
2019-01-19 10:35:053978browse

This article brings you a detailed introduction to the partition table in MySQL. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

For users, the partition table is an independent logical table, but it is composed of multiple physical sub-tables at the bottom. The code that implements partitioning is actually an encapsulation of the handle objects of a set of underlying tables. Requests for partition tables will be converted into interface calls to the storage engine through the handle objects

Meaning

MySQL can define the data stored in each partition by using the PARTITION BY clause when creating a table. When executing a query, the optimizer filters those partitions that do not have the data we need based on the partition definition, so that the query does not need to scan all partitions - only the partitions that contain the required data can be found.

One of the main purposes of partitioning is to store data in different tables at a coarser granularity. Doing this can store related data together. In addition, it will be very convenient when we want to batch delete the data of the entire partition at one time.

Partitioning can play a big role in the following scenarios:

  • The table is so large that it cannot all be placed in memory, or only the table The last part has hotspot data and the rest are historical data

  • Partitioned table data is easier to maintain

  • Partitioned table data can be distributed in different On physical devices

  • Partition tables can be used to avoid certain bottlenecks

  • If necessary, independent partitions can be backed up and restored

The partition table itself also has some limitations, the following points are particularly important:

  • A table can only have a maximum of 1024 Partition

  • In MySQL5.1, the partition expression must be an integer, or an expression that returns an integer. In MySQL5.5, columns can be used directly for partitioning in some scenarios

  • Foreign key constraints cannot be used in partitioned tables

  • If partitioning If there are primary key or unique index columns in the field, then all primary key columns and unique index columns must be included

Principle of partitioned table

There is no difference between the storage engine's management of each underlying table in the partition and its management of ordinary tables (all underlying tables must use the same storage engine)
. The index of the partition table is just to add an identical index to each underlying table. index. From the perspective of the storage engine, there is no difference between the underlying table and an ordinary table, and the storage engine does not need to know whether it is an ordinary table or part of a partitioned table.

The operations on the partition table are performed according to the following operation logic:

SELECT query

When querying a partition table, the partition layer first opens and locks all bottom layers table, the optimizer first determines whether some partitions can be filtered, and then calls the corresponding storage engine interface to access the data of each partition

INSERT operation

When writing a record, the partition layer First open and lock all underlying tables, then determine which partition receives this record, and then write the record to the corresponding underlying table

DELETE operation

When a record is deleted, the partition The layer first opens and locks all underlying tables, then determines the partition corresponding to the data, and finally deletes the corresponding underlying table

UPDATE operation

When a record is updated, the partition layer is opened first And lock all the underlying tables. MySQL first determines which partition the record needs to be updated, then takes out the data and updates it, then determines which partition the updated data should be placed in, and finally writes to the underlying table and updates the original data. Delete the underlying table where it is located.

These operations support filtering.

Although each operation will "first open and lock all underlying tables", this does not mean that the partition table locks the entire table during processing . If the storage engine can implement row-level locks by itself, the corresponding table lock will be released at the partition level. This locking and unlocking process is similar to queries on ordinary InnoDB.

Types of partition tables

MySQL supports a variety of partition tables. The most common one we see is partitioning based on ranges. Each partition storage falls within a certain range. record of. The partition expression can be a column or an expression containing columns.

For example, the following table stores each year's sales in different partitions:

CREATE TABLE sales(
    order_date DATETIME NOT NULL,
    ....
)ENGINE=InnoDB PARTITION BY RANGE(YEAR(order_date))(
    PARTITION p_2010 VALUES LESS THAN (2010),
    PARTITION p_2011 VALUES LESS THAN (2011),
    PARTITION p_2012 VALUES LESS THAN (2012),
    PARTITION p_catchall VALUES LESS THAN MAXVALUE;
)

PARTITION Various functions can be used in the partition clause. But there is a requirement, The value returned by the expression must be a definite integer and cannot be a constant.

MySQL also supports key value, hash and list partitioning, etc.

How to use partitioned tables

If we want to query records for a period of time from a very large table, how should we query this table and how can we make it more efficient? ?

Because the amount of data is very large, we certainly cannot scan the entire table every time we query. Considering the space and maintenance consumption of indexes, we do not want to use indexes. Even if you do use indexes, you will find that the data is not aggregated in the desired way, resulting in a large amount of fragmentation, eventually causing a query to generate thousands of random I/Os. In fact, When the amount of data is extremely large, the B-Tree index can no longer function.

So we can choose some more coarse-grained but less expensive ways to retrieve data, such as indexing only a small piece of corresponding metadata on a large amount of data.

This is exactly what partitioning does. Understanding partitioning can be regarded as the initial form of the index. Because partitions do not require additional data structures to record the data in each partition - partitions do not need to accurately locate the location of each piece of data, so there is no need for additional data structures - so the cost is very low. Only a simple expression is needed to express what data is stored in each partition.

In order to ensure the scalability of large amounts of data, there are generally two strategies:

  1. Scan the data in full without any index: As long as the WHERE condition can be used to limit the required data to a few partitions, the efficiency is very high. Using this strategy assumes that the data does not need to be completely placed in memory, and also assumes that all the required data is on disk. Because the memory is relatively small, the data will be squeezed out of the memory quickly, so the cache will not play any role. This strategy is suitable when large amounts of data are accessed in a normal way.

  2. Index data and separate hot spots: If the data has obvious "hot spots" and except for this part of the data, other data is rarely accessed, then you can Put this part of hotspot data in a separate partition so that the data in this partition can be cached in memory. Such queries can only access a small partitioned table, can use indexes, and can also use cache effectively.

Under what circumstances will problems occur

The two partitioning strategies introduced above are based on two very important assumptions: queries can be filtered Dropping a lot of extra partitions and partitions themselves will not bring a lot of extra costs.

It turns out that these two assumptions will be problematic in some scenarios:

  • Partition columns and index columns do not match: If defined The mismatch between the index column and the partition column will cause the query to fail to perform partition filtering.

  • The cost of choosing a partition can be high: Different types of partitions are implemented differently, so their performance varies. Particularly with range partitioning, the cost of querying which partitions a qualifying row belongs to can be very high because the server needs to scan the list of all partition definitions to find the correct answer.

  • The cost of opening and locking all underlying tables may be high: When a query accesses a partitioned table, MySQL needs to open and lock all underlying tables. This is another overhead of partitioned tables.

  • The cost of maintaining partitions may be high: Some partition maintenance operations will be very fast, such as adding or deleting partitions. Some operations, such as reorganizing partitions or similar ALTER statements, may be very costly because such operations require copying data.

The above is the detailed content of Detailed introduction to partition tables in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:cnblogs.com. If there is any infringement, please contact admin@php.cn delete