What are the disadvantages of mysql clustered index?-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

What are the disadvantages of mysql clustered index?

一个新手

Sep 19, 2017 am 09:35 AM

mysqlWhichshortcoming

The clustered index is not a separate index type, but a data storage method (not a data structure, but a storage structure). The specific details depend on its implementation, but the clustered index of innodb is actually The btree index and data rows are saved in the same structure.

When a table has an index, its data rows are actually stored in the leaf pages of the index. Clustering means that the data rows and adjacent key values are compactly stored together, because the data rows cannot be stored at the same time. Stored in two different places, so a table can only have one clustered index. Because the storage engine is responsible for implementing the index, not all storage engines support clustered indexes. The following mainly introduces innodb, but the principles discussed below are applicable to any engine that supports clustered indexes:

The leaf page contains all the data of the row, but the node page only contains the index column (or it can be said that the non-leaf page The node pages of the node contain the index of the index value, because the values contained in these node pages are extracted from the index column).

Innodb will aggregate data by primary key. If there is no primary key defined, Innodb will choose the first non-empty unique index instead. If there is no non-empty unique index, Innodb will implicitly define a 6-byte rowid primary key. as a clustered index. InnoDB only aggregates records in the same page, pages containing adjacent key values may be far apart.

Note: Clustered primary keys may help performance, but they may also cause serious performance problems, especially when the table's storage engine is converted from innodb to another engine.

Aggregated data has some important advantages:

A: Related data can be saved together. For example, when implementing email, you can aggregate data based on user ID, so you only need to All emails of a user can be obtained by reading a small number of data pages from the disk. If the clustered index is not used, each email may cause a disk IO

B: Data access is faster, the clustered index will index and The data is stored in the same btree, so retrieving data from a clustered index is usually faster than looking up in a non-clustered index

C: Queries using covering index scans can directly use the primary key value in the page node

Disadvantages of clustered indexes:

A: Clustered data maximizes the performance of IO-intensive applications, but if the data is all placed in memory, the order of access is not so important. No, the clustered index has no advantage anymore

B: The insertion speed depends heavily on the insertion order. Inserting in the order of the primary key is the fastest way to load data into the innodb table, but if it is not loaded in the order of the primary key data, then it is best to use the optimize table command to reorganize the table after the loading is complete

C: Updating clustered index columns is very expensive because it forces innodb to move each updated row to a new location

D: When a table based on a clustered index inserts a new row, or when the primary key is updated and the row needs to be moved, it may face the problem of page splitting. When the primary key value of a row requires that the row must be inserted into a certain When the page is full, the storage engine will split the page into two pages to accommodate the row. This is a page splitting operation. Page splitting will cause the table to occupy more disk space

　E: Aggregation Indexes may cause full table scans to slow down, especially when rows are sparse, or data storage is discontinuous due to page splits

F: The secondary index may be larger than expected, because in the secondary index Leaf nodes contain the primary key columns of the reference rows.

G: Secondary index access requires two index lookups instead of one

Because what is stored in the secondary index leaf node is not the pointer to the physical location of the row, but the primary key value of the row. This means that when searching for rows through the secondary index, the storage engine needs to find the leaf node of the secondary index to obtain the corresponding primary key value, and then use this primary key value to find the corresponding row in the clustered index. Repeated work is done here, two btree lookups instead of once. For innodb, adaptive hash indexes can reduce such repeated work.

Comparison of data distribution between innodb and myisam physical storage:

Myisam:

It is stored on the disk in the order of data insertion. The primary key index and secondary level in myisam There is no difference in the structure of the index. The primary key index is a unique non-empty index named primary.

innodb:

Because innodb supports clustered indexes, it uses a very different way to store the same data. The innodb clustered index contains the data of the entire table, not just the index, because in Innodb , the clustered index is a table, so it does not require independent row storage like myisam. Each leaf node of the clustered index contains the primary key value, transaction ID, rollback pointer for transaction and MVCC, and the values of all remaining columns. If the primary key is a column prefix index, InnoDB also contains the complete primary key column and The remaining column values.

Another thing that is different from myisam is that the secondary index of innodb is very different from the clustered index. The leaf nodes of the secondary index of innodb store not the row pointer, but the primary key value, and use this as Pointers to rows. This strategy reduces the maintenance work of the secondary index when rows are moved or data pages are split. Using the primary key value as a pointer will make the secondary index take up more space. The benefit is, InnoDB does not need to update this pointer in the secondary index when moving rows.

Insert rows in the order of primary key in the innodb table. If you are using the Innodb table and there is no data to be aggregated, you can define a surrogate key as the primary key. This primary key data should have nothing to do with the application. The simplest The method is to use auto_increment to automatically increment the column, which can ensure that the data rows are inserted in order, and the performance of association operations based on the primary key will be better.

Do not use UUID as a clustered index, otherwise the performance will be very bad, because it makes the insertion of the clustered index completely random, making the data without any clustering characteristics. Because UUID is used as the primary key to insert rows, not only does it take longer, but the index is also larger. This is because the primary key field has become longer. On the other hand, it is undoubtedly due to the longer time caused by page splitting and the index change caused by fragmentation. big. Because the primary key values are sequential, Innodb stores each record after the previous record. When the maximum fill factor of the page is reached (InnoDB's default maximum fill factor is 15/16 of the page size, leaving (to free up some space for later modification), the next record will be written to a new page. Once the data is loaded in this sequence, the primary key page will be approximately filled with sequential records, which is what is expected. The results (however, secondary index pages may be different).

Under the UUID primary key, because the primary key value of the newly inserted row is not necessarily greater than the previous one, innodb cannot simply always insert the new row at the end of the index, but needs to find the new row. The appropriate location is usually the middle location of the existing data, and allocating new space will add a lot of extra work and lead to less than optimal data distribution. The following are some disadvantages of using UUID as the primary key:

A: The written target page may have been flushed to the disk and removed from the cache, or it has not been loaded into the cache. InnoDB has to find and read the target page from the disk into the memory before inserting it. This is Will result in a lot of random IO

B: Because writes are out of order, innodb has to do page splitting operations frequently to allocate space for new rows. Page splitting will cause a large amount of data to be moved and inserted at one time At least three pages need to be modified instead of one page

C: Due to frequent page splits, pages will become sparse and filled irregularly, so the final data will be fragmented

After loading these random values into the clustered index, you may need to do an optimize table to rebuild the table and optimize page filling. When using InnoDB, you should insert data in primary key order as much as possible, and use a simple increment of the value of the clustering key to insert new rows whenever possible.

Note: When does a sequential primary key cause worse results?

For high-concurrency workloads, inserting in the order of primary keys in Innodb may cause obvious contention. The upper bound of the primary key will be called a hotspot, because all insertions occur here, so concurrent insertions may cause Gap lock contention, another hotspot may be the auto_increment lock mechanism. If you encounter this problem, you may need to redesign the table or application, or change the innodb_autoinc_lock_mode configuration.

The above is the detailed content of What are the disadvantages of mysql clustered index?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles