Home  >  Article  >  Database  >  What are the basic issues with MySQL?

What are the basic issues with MySQL?

WBOY
WBOYforward
2023-04-17 15:10:03810browse

What are the basic issues with MySQL?

General Chapter

1. Let’s talk about the three major paradigms of databases?

First normal form: field atomicity, second normal form: unique rows, with primary key columns, third normal form: each column is related to the primary key column.

In actual applications, a small number of redundant fields will be used to reduce the number of related tables and improve query efficiency.

2. Only one piece of data is queried, but the execution is very slow. What are the common reasons?

  • The MySQL database itself is blocked, for example: insufficient system or network resources

  • SQL statements are blocked, for example : Table lock, row lock, etc., causing the storage engine not to execute the corresponding SQL statement

  • It is indeed that the index is used improperly and the index is not used

  • Due to the characteristics of the data in the table, the index is removed, but the number of table returns is huge

3, count(*), count(0), count(id) implementation methods The difference?

  • For count(*), count(constant), count(primary key) form For the count function, the optimizer can select the index with the smallest scan cost to execute the query, thereby improving efficiency. Their execution processes are the same.

  • For count (non-index column), the optimizer chooses a full table scan, which means that it can only sequentially scan the leaf nodes of the clustered index.

  • count (secondary index column)Only the index containing the columns we specify can be selected to execute the query, which may result in the execution cost of the index selected by the optimizer. Not the smallest.

What are the basic issues with MySQL?

4. What should I do if I accidentally delete data?

1) If the amount of data is relatively large, use physical backup xtrabackup. Regularly perform full backups of the database, and you can also perform incremental backups.

2) If the amount of data is small, use mysqldump or mysqldumper, and then use binlog to recover or set up a master-slave method to recover the data. You can recover from the following points:

  • DML misoperation statement: You can use flashback to first parse the binlog event and then reverse it.

  • DDL statement misoperation: Data can only be restored through full backup and binlog application. Once the amount of data is relatively large, the recovery time will be particularly long.

  • rm Deletion: Use backup across computer rooms, or preferably across cities.

5. The difference between drop, truncate and delete

  • The DELETE statement executes the deletion process from the table each time Delete a row and save the row deletion as a transaction record in the log for rollback operation.

  • TRUNCATE TABLE deletes all data from the table at once and does not record individual deletion operation records in the log. Deleted rows cannot be recovered. And the deletion trigger related to the table will not be activated during the deletion process, and the execution speed is fast.

  • The drop statement releases all the space occupied by the table.

#6. Why does MySQL large table query not burst the memory?

  • MySQL is "sending while reading", which means that if the client receives slowly, the MySQL server will not be able to send the results due to this transaction. The execution time becomes longer.

  • The server does not need to save a complete result set. The processes of getting and sending data are all operated through a next_buffer.

  • Memory data pages are managed in Buffer Pool (BP).

  • InnoDB manages Buffer Pool using an improved LRU algorithm, which is implemented using a linked list. In InnoDB implementation, the entire LRU linked list is divided into young area and old area according to the ratio of 5:3 to ensure that hot data will not be washed away when cold data is loaded in large batches.

7. How to deal with deep paging (extremely large paging)?

  • Use id optimization: first find the last paging The maximum ID, and then use the index on the id to query, similar to select * from user where id>1000000 limit 100.

  • Optimize with covering index: When a MySQL query completely hits the index, it is called a covering index, which is very fast because the query only needs to search on the index and can be returned directly afterwards without Then go back to the table to get the data. Therefore, we can first find out the ID of the index, and then get the data based on the Id.

  • Limit the number of pages if business permits

8. How do you optimize SQL in daily development?

  • Add appropriate indexes: Create an index for the fields used as query conditions and order by, consider multiple query fields to establish a combined index, and pay attention to the order of the combined index fields. Place the columns most commonly used as restrictive conditions on the far left, in descending order. The indexes should not be too many, generally within 5.

  • Optimize table structure: Numeric fields are better than string types. Smaller data types are usually better. Try to use NOT NULL

  • Optimization Query statement: Analyze the SQl execution plan, whether the index is hit, etc. If the SQL is very complex, optimize the SQL structure. If the amount of table data is too large, consider splitting the table

9, MySQL What is the difference between concurrent connections and concurrent queries?

  • In the result of executing show processlist, I saw thousands of connections, which refers to concurrent connections.

  • The statement "currently executing" is a concurrent query.

  • The number of concurrent connections affects memory.

  • Concurrent queries that are too high are detrimental to the CPU. A machine has a limited number of CPU cores and if all threads rush in, the cost of context switching will be too high.

  • It should be noted that after a thread enters the lock wait, the concurrent thread count is reduced by one, so threads waiting for row locks or gap locks are not included in the count range. That is to say, the thread waiting for the lock does not consume the CPU, thereby preventing the entire system from locking up.

#10. How does MySQL operate internally when updating a field value to its original value?

  • When the same data is used, the update will not be performed.

  • However, the log processing methods are different for different binlog formats:

    • 1) When based on row mode, server The layer matches the record to be updated and finds that the new value is consistent with the old value. It returns directly without updating and does not record the binlog.

    • 2) When based on statement or mixed format, MySQL executes the update statement and records the update statement to binlog.

#11. What is the difference between datetime and timestamp?

  • The date range of datetime is 1001-9999; the time range of timestamp is 1970-2038

  • datetime storage Time has nothing to do with time zone; timestamp storage time is related to time zone, and the displayed value also depends on time zone

  • The storage space of datetime is 8 bytes; the storage space of timestamp is 4 bytes

  • The default value of datetime is null; the default value of timestamp field is not null (not null), and the default value is the current time (current_timestamp)

12. What are the isolation levels of transactions?

  • "Read Uncommitted" is the lowest level and cannot be guaranteed under any circumstances

  • "Read Uncommitted" ( Read Committed) can avoid the occurrence of dirty reads

  • "Repeatable Read" can avoid the occurrence of dirty reads and non-repeatable reads

  • "Serializable" can avoid the occurrence of dirty reads, non-repeatable reads, and phantom reads

  • The default transaction isolation level of Mysql is "Repeatable Read" )

13. There are two kill commands in MySQL

  • kill query thread id, which means to terminate this thread The statement being executed in

  • kill connection thread id, where connection can be defaulted, means disconnecting this thread

Index

1. What are the index categories?

  • According to the content of the leaf node, the index type is divided into primary key index and non-primary key index.

  • The leaf node of the primary key index stores the entire row of data. In InnoDB, the primary key index is also called a clustered index.

  • The leaf node content of a non-primary key index is the value of the primary key. In InnoDB, non-primary key indexes are also called secondary indexes.

#2. What is the difference between clustered index and non-clustered index?

  • Clustered index: The clustered index is an index created with the primary key. The clustered index stores the data in the table in the leaf nodes.

    What are the basic issues with MySQL?

  • ##Non-clustered index: Indices created with non-primary keys store primary keys and index columns in leaf nodes. Use non-clustered indexes. When the clustered index queries the data, get the primary key on the leaf and then find the data you want to find. (The process of getting the primary key and then searching for it is called table return).

    What are the basic issues with MySQL?

  • Covering index: Assuming that the columns being queried happen to be the columns corresponding to the index, there is no need to go back to the table to check. Then this index column is called a covering index.

3. Why does InnoDB design B-tree instead of B-Tree, Hash, binary tree, and red-black tree?

  • Hash index can handle the addition, deletion, modification and query of a single data row at O(1) speed, but when faced with range queries or sorting, it will cause a full table scan. result.

  • B-tree can store data in non-leaf nodes. Since all nodes may contain target data, we always have to traverse the subtree downwards from the root node to find the ones that meet the conditions. Data rows, this feature brings a large amount of random I/O, causing performance degradation.

  • All data rows of the B tree are stored in leaf nodes, and these leaf nodes can be connected in order through "pointers". When we traverse the data in the B tree as shown below You can directly jump between multiple child nodes, which can save a lot of disk I/O time.

  • Binary tree: The height of the tree is uneven and cannot be self-balancing. The search efficiency is related to the data (the height of the tree), and the IO cost is high.

  • Red-black tree: The height of the tree increases as the amount of data increases, and the IO cost is high.

#4. Let’s talk about clustered index and non-clustered index?

  • In InnoDB, the leaf node of the index B Tree stores the entire row of data is the primary key index, also called a clustered index, that is, the data storage and index are placed When you get to a piece, you find the index and you find the data.

  • The leaf nodes of the index B Tree store the value of the primary key, which is a non-primary key index, also known as a non-clustered index and a secondary index.

  • The first index is generally sequential IO, and the operation of returning to the table is random IO. The more times we need to return to the table, that is, the more times we need random IO, the more we tend to use full table scans.

5. Will non-clustered indexes definitely return table queries?

  • Not necessarily, this involves whether all the fields required by the query statement hit the index. If all the fields hit the index, then there is no need to perform a query back to the table. An index contains (covers) the values ​​of all fields that need to be queried, and is called a "covering index".

#6. Talk about the leftmost prefix principle of MySQL?

  • The leftmost prefix principle is leftmost priority. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side.

  • MySQL will keep matching to the right until it encounters a range query (>, <, between, like) and stops matching, such as a = 1 and b = 2 and c > 3 and d = 4 If you create an index in the order of (a, b, c, d), the index of d will not be used. If you create an index of (a, b, d, c), you can use both a and b. , the order of d can be adjusted arbitrarily.

  • = and in can be out of order, such as a = 1 and b = 2 and c = 3. The (a, b, c) index can be created in any order, and the MySQL query optimizer will Help you optimize it into a form that can be recognized by the index.

7. What is index pushdown?

  • When the leftmost prefix principle is met, the leftmost prefix can be used to locate records in the index.

  • Before MySQL 5.6, you could only return tables one by one starting from the ID. Find the data row on the primary key index, and then compare the field values.

  • The index pushdown optimization (index condition pushdown) introduced in MySQL 5.6 can be used to judge the fields included in the index first during the index traversal process and directly filter out the fields that do not meet the conditions. records to reduce the number of table returns.

#8. Why does Innodb use auto-incrementing id as the primary key?

  • If the table uses an auto-increment primary key, then every time a new record is inserted, the record will be added to the subsequent position of the current index node in sequence. When the page is full, A new page will be opened automatically. If a non-auto-increasing primary key is used (such as ID number or student number, etc.), since the value of the primary key inserted each time is approximately random, each new record must be inserted somewhere in the middle of the existing index page, frequently. Moving and paging operations caused a large amount of fragmentation and resulted in an index structure that was not compact enough. Subsequently, OPTIMIZE TABLE (optimize table) had to be used to rebuild the table and optimize the filled pages.

#9. What is the implementation principle of transaction ACID characteristics?

  • "Atomicity": It is implemented using undo log. If an error occurs during transaction execution or the user performs rollback, the system returns the status of the transaction start through the undo log. .

  • "Persistence": Use redo log to achieve this. As long as the redo log is persisted, when the system crashes, the data can be recovered through the redo log.

  • "Isolation": Transactions are isolated from each other through locks and MVCC.

  • "Consistency": Achieve consistency through rollback, recovery, and isolation in concurrent situations.

10. What is the difference between MyISAM and InnoDB in how they implement B-tree indexes?

  • InnoDB storage engine: the leaf nodes of the B-tree index save the data itself;

  • MyISAM storage engine: the leaves of the B-tree index The physical address where the node saves the data;

  • InnoDB, its data file itself is an index file. Compared with MyISAM, the index file and data file are separated, and its table data file itself is by B An index structure organized by Tree. The node data field of the tree saves complete data records. The key of this index is the primary key of the data table. Therefore, the InnoDB table data file itself is the primary index. This is called "clustered index" or clustering. index, and the rest of the indexes are used as auxiliary indexes. The data field of the auxiliary index stores the value of the corresponding record's primary key instead of the address. This is also different from MyISAM.

11. What are the categories of indexes?

  • According to the content of the leaf node, the index type is divided into primary key index and non-primary key index.

  • The leaf node of the primary key index stores the entire row of data. In InnoDB, the primary key index is also called a clustered index.

  • The leaf node content of a non-primary key index is the value of the primary key. In InnoDB, non-primary key indexes are also called secondary indexes.

#12. What scenarios can cause index failure?

Background: The fast positioning capability provided by B-tree comes from the orderliness of sibling nodes on the same layer. Therefore, if this orderliness is destroyed, it will most likely fail. The details are as follows: This situation:

  • Use left or left fuzzy matching on the index: that is, like %xx or like %xx%. Both of these methods will cause index failure. The reason is that the query results may be "Chen Lin, Zhang Lin, Zhou Lin" and so on, so we don't know which index value to start comparing with, so we can only query through full table scan.

  • Use functions for indexes/Expression calculations for indexes: Because the index saves the original value of the index field, rather than the value calculated by the function, there is no way to use the index. .

  • Implicit type conversion for the index: equivalent to using a new function

  • OR in the WHERE clause: means two as long as Just satisfy one, so it makes no sense if only one conditional column is an index column. As long as the conditional column is not an index column, a full table scan will be performed.

Proposal

1. There is a system that is not divided into databases and tables. How to design it so that the system can dynamically switch to databases and tables?

  • Stop expansion (not recommended)

  • Double-write migration plan: Design the expanded table structure plan, and then perform single-write migration The library and sub-library implement dual writing. After observing for a week that there is no problem, turn off the read traffic of the single library. After observing for a period of time, after it continues to be stable, turn off the write traffic of the single library and smoothly switch to the sub-database and tables.

#2. How to design a sub-database and table scheme that can dynamically expand and reduce capacity?

Principles

1. What are the steps to execute a MySQL statement?

What are the basic issues with MySQL?

  • The steps for the Server layer to execute sql in sequence are:

  • Client request- > Connector (verify user identity and grant permissions) -> Query cache (return directly if cache exists, perform subsequent operations if it does not exist) -> Analyzer (perform lexical analysis and syntax analysis of SQL) -> Optimizer (mainly selects the optimal execution plan method for executing SQL optimization) -> Executor (during execution, it will first check whether the user has execution permission, and only then can use the interface provided by this engine) -> Go to the engine layer Get the data to return (if the query cache is turned on, the query results will be cached).

#2. What is the internal principle of order by sorting?

  • #MySQL will allocate a memory (sort_buffer) for each thread for sorting. The memory size is sort_buffer_size.

  • If the amount of data to be sorted is less than sort_buffer_size, the sorting will be completed in memory.

  • If the amount of sorted data is large and cannot be stored in memory, temporary files on disk will be used to assist sorting, also known as external sorting.

  • When using external sorting, MySQL will divide it into several separate temporary files to store the sorted data, and then merge these files into one large file.

3. MVCC implementation principle?

  • MVCC (Multiversion concurrency control) is a way to retain multiple versions of the same data, thereby achieving concurrency control. When querying, find the data of the corresponding version through the read view and version chain.

  • Function: Improve concurrency performance. For high-concurrency scenarios, MVCC is less expensive than row-level locks.

  • The implementation of MVCC relies on the version chain, which is implemented through three hidden fields of the table.

    • #1) DB_TRX_ID: Current transaction id, the time sequence of the transaction is judged by the size of the transaction id.

    • 2) DB_ROLL_PRT: The rollback pointer points to the previous version of the current row record. Through this pointer, multiple versions of the data are connected together to form an undo log version chain.

    • 3) DB_ROLL_ID: primary key. If the data table does not have a primary key, InnoDB will automatically generate a primary key.

#4. What is change buffer and what is its function?

5. How does MySQL ensure that data is not lost?

  • As long as redolog and binlog ensure persistent disks, MySQL exceptions can be ensured Data recovery binlog writing mechanism after restart.

  • redolog ensures that lost data can be redone after a system exception, and binlog archives the data to ensure that lost data can be recovered.

  • Write redolog before transaction execution. During transaction execution, the log is first written to the binlog cache. When the transaction is submitted, the binlog cache is written to the binlog file. .

6. Why does the size of the table file remain unchanged after deleting the table?

  • After the data item is deleted, InnoDB marks page A and it will be marked as reusable.

  • delete command deletes the data of the entire table Woolen cloth? As a result, all data pages will be marked as reusable. But on disk, the file does not get smaller.

  • Tables that have undergone a large number of additions, deletions, and modifications may have holes. These holes also take up space, so if these holes can be removed, the purpose of shrinking the table space can be achieved.

  • Rebuilding the table can achieve this purpose. You can use the alter table A engine=InnoDB command to rebuild the table.

7. Comparison of the three formats of binlog

  • The primary key id of the operation row recorded in the binlog format of row and the primary key id of each row The real value of each field, so there will be no inconsistency in the primary and secondary operation data.

  • statement: the recorded source SQL statement

  • mixed: the first two are mixed, why do you need a file in mixed format, because some statements The binlog format may cause inconsistency between the primary and secondary servers, so the row format must be used. But the disadvantage of the row format is that it takes up a lot of space. MySQL has taken a compromise. MySQL itself will judge whether this SQL statement may cause inconsistency between the primary and secondary servers. If possible, use the row format, otherwise use the statement format.

8. MySQL locking rules

  • Principle 1: The basic unit of locking is next-key lock. next-key lock is an open and closed interval.

  • Principle 2: Only the objects accessed during the search process will be locked

  • Optimization 1: Equivalent query on the index, give unique When the index is locked, the next-key lock degenerates into a row lock.

  • Optimization 2: For equivalent queries on the index, when traversing to the right and the last value does not meet the equality condition, the next-key lock degenerates into a gap lock

  • A bug: a range query on a unique index will access the first value that does not meet the condition.

#9. What are dirty reads, non-repeatable reads, and phantom reads?

  • "Dirty reading": Dirty reading refers to reading uncommitted data from other transactions. Uncommitment means that the data may be rolled back, which means that it may not be used in the end. Will be stored in the database, that is, data that does not exist. Reading data that may not eventually exist is called dirty reading.

  • "Non-repeatable read": Non-repeatable read means that within a transaction, the data read at the beginning is inconsistent with the same batch of data read at any time before the end of the transaction. Case.

  • "Phantom reading": Phantom reading does not mean that the result sets obtained by two reads are different. The focus of phantom reading is the data status of the result obtained by a certain select operation. Unable to support subsequent business operations. To be more specific: select whether a certain record exists. If it does not exist, prepare to insert the record. However, when executing insert, it is found that the record already exists and cannot be inserted. At this time, a phantom read occurs.

10. What kind of locks does MySQL have? Wouldn’t locking like the above hinder concurrency efficiency?

  • In terms of lock categories, there are shared locks and exclusive locks.

    • #1) Shared lock: Also called read lock. When the user wants to read data, a shared lock is added to the data. Multiple shared locks can be added at the same time .

    • 2) Exclusive lock: Also called write lock. When the user wants to write data, add an exclusive lock to the data. Only one exclusive lock can be added, he and other Exclusive locks and shared locks are mutually exclusive.

  • The granularity of locks depends on the specific storage engine. InnoDB implements row-level locks, page-level locks, and table-level locks.

  • Their locking overhead increases from large to small, and their concurrency capabilities also increase from large to small.

Framework

1. What is the principle of Mysql master-slave replication?

What are the basic issues with MySQL?

  • Master’s update events (update, insert, delete) will be written to bin-log in order. When the Slave is connected to the Master, the Master machine will open the binlog dump thread for the Slave, and this thread will read the bin-log log.

  • After the Slave is connected to the Master, the Slave library has an I/O thread Read the bin-log log by requesting the binlog dump thread, and then write it to the slave library relay logIn the log.

  • Slave also has a SQL thread, which monitors the relay-log log content in real time for updates, parses the SQL statements in the file, and executes them in the Slave database.

2. What are the master-slave replication synchronization methods for Mysql?

  • Asynchronous replication: Mysql master-slave synchronization The default is asynchronous replication. That is, among the above three steps, only the first step is synchronous (that is, Mater writes the bin log log), that is, the master library can successfully return to the client after writing the binlog log, without waiting for the binlog log to be transferred to the slave library.

  • Synchronous replication: For synchronous replication, after the Master host sends the event to the Slave host, a wait will be triggered until all Slave nodes (if there are multiple Slave) returns information about successful data replication to the Master.

  • Semi-synchronous replication: For semi-synchronous replication, after the Master host sends the event to the Slave host, a wait will be triggered until one of the Slave nodes (if (There are multiple Slaves) returns information about successful data replication to the Master.

3. What causes the Mysql master-slave synchronization delay? How to optimize it?

  • If the master node executes a large transaction, it will have a greater impact on the master-slave delay

  • Network Delay, large log, too many slaves

  • Multi-thread writing on the master, only single-thread synchronization on the slave node

  • Machine performance issues , whether the slave node uses a "bad machine"

  • Lock conflict problems may also cause the slave's SQL thread to execute slowly

4. What causes the Mysql master-slave synchronization delay? How to optimize it?

  • Large transactions: Divide large transactions into small transactions and update data in batches

  • Reduce the number of Slave to no more than 5 to reduce the size of a single transaction

  • After Mysql 5.7, you can use multi-threaded replication and use the MGR replication architecture

  • In disk, raid If there is a problem with the card or scheduling strategy, a single IO delay may be very high. You can use the iostat command to check the IO situation of the DB data disk and then make further judgments

  • For lock problems, you can Check by grabbing the processlist and looking at the tables related to locks and transactions under information_schema.

6. What are bin log/redo log/undo log?

  • bin log is a file at the Mysql database level. It records all operations that modify the Mysql database. Select and show statements will not be recorded.

  • What is recorded in the redo log is the data to be updated. For example, if a piece of data is submitted successfully, it will not be synchronized to the disk immediately. Instead, it will be recorded in the redo log first and wait for the appropriate Refresh the disk when the opportunity arises, in order to achieve transaction durability.

  • undo log is used for data recall operations. It retains the content before the record is modified. Transaction rollback can be achieved through undo log, and MVCC can be implemented by tracing back to a specific version of data based on undo log.

The above is the detailed content of What are the basic issues with MySQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete