Inserting into large MySQL table without auto-increment primary key is very slow
P粉845862826
2023-08-29 20:31:33
<p>I've recently noticed a significant increase in the difference in time required to complete a simple INSERT statement. While these statements take about 11 milliseconds on average, sometimes they can take 10-30 seconds, and I've even noticed them taking more than 5 minutes to execute. </p>
<p>MySQL version is <code>8.0.24</code>, running on Windows Server 2016. As far as I know, the server's resources have never been overloaded. The server has ample CPU overhead available and 32GB of RAM allocated to it. </p>
<p>This is the table I'm using: </p>
<pre class="brush:php;toolbar:false;">CREATE TABLE `saved_segment` (
`recording_id` bigint unsigned NOT NULL,
`index` bigint unsigned NOT NULL,
`start_filetime` bigint unsigned NOT NULL,
`end_filetime` bigint unsigned NOT NULL,
`offset_and_size` bigint unsigned NOT NULL DEFAULT '18446744073709551615',
`storage_id` tinyint unsigned NOT NULL,
PRIMARY KEY (`recording_id`,`index`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci</pre>
<p>This table has no other indexes or foreign keys, and is not used as a reference to foreign keys in any other table. The entire table size is about 20GB and the number of rows is about 281M, which I feel is not too big. </p>
<p>The table is used almost exclusively in read-only mode, with up to 1000 reads per second. All of these reads occur in simple SELECT queries rather than complex transactions, and they effectively utilize the primary key index. There are very few, if any, concurrent writes to this table. This was done on purpose to try and figure out if it would help with slow insertion, but it didn't. Until then, there are always up to 10 concurrent inserts in progress. UPDATE or DELETE statements will never be executed on this table. </p>
<p>The queries I have problems with are all constructed like this. They never appear in the transaction.While inserts based on the clustered primary key are definitely not append-only, the query will almost always insert between 1 and 20 adjacent rows into the table: </p>
<pre class="brush:php;toolbar:false;">INSERT IGNORE INTO saved_segment
(recording_id, `index`, start_filetime, end_filetime, offset_and_size, storage_id) VALUES
(19173, 631609, 133121662986640000, 133121663016640000, 20562291758298876, 10),
(19173, 631610, 133121663016640000, 133121663046640000, 20574308942546216, 10),
(19173, 631611, 133121663046640000, 133121663076640000, 20585348350688128, 10),
(19173, 631612, 133121663076640000, 133121663106640000, 20596854568114720, 10),
(19173, 631613, 133121663106640000, 133121663136640000, 20609723363860884, 10),
(19173, 631614, 133121663136640000, 133121663166640000, 20622106425668780, 10),
(19173, 631615, 133121663166640000, 133121663196640000, 20634653501528448, 10),
(19173, 631616, 133121663196640000, 133121663226640000, 20646967172721148, 10),
(19173, 631617, 133121663226640000, 133121663256640000, 20657773176227488, 10),
(19173, 631618, 133121663256640000, 133121663286640000, 20668825200822108, 10)</pre>
<p>This is the output of the EXPLAIN statement for the above query: </p>
<table class="s-table">
<thead>
<tr>
<th>id</th>
<th>Select type</th>
<th>Table</th>
<th>Partition</th>
<th>Type</th>
<th>Possible keys</th>
<th>key</th>
<th>key_len</th>
<th>Reference</th>
<th>OK</th>
<th>Filtered</th>
<th>Extra</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Insert</td>
<td>Saved segments</td>
<td>Empty</td>
<td>All</td>
<td>Empty</td>
<td>Empty</td>
<td>Empty</td>
<td>Empty</td>
<td>Empty</td>
<td>Empty</td>
<td>Empty</td>
</tr>
</tbody>
</table>
<p>These issues are relatively new and were not noticeable when the table was about twice as small. </p>
<p>I tried reducing the number of concurrent inserts in the table from about 10 to 1. I also removed the foreign keys (<code>recording_id</code>) on some columns to speed up the inserts even more. <code>Analyzing tables</code> and schema analysis did not yield any actionable information.</p>
<p>One solution I thought of was to remove the clustered primary key and add an auto-increment primary key and a regular index on the <code>(recording_id, index)</code> column. In my opinion this would help make the insert "append-only".I'm open to any and all suggestions, thanks in advance! </p>
<p>Edit:
I'll address some of the points and questions raised in the comments and answers: </p>
<ul>
<li><code>autocommit</code> is set to <code>ON</code></li>
The value of <li><code>innodb_buffer_pool_size</code> is <code>21474836480</code>, and the value of <code>innodb_buffer_pool_chunk_size</code> is <code>134217728< /code></ li>
<li>One comment raised concerns about contention between the read lock used for reads and the exclusive lock used for writes. The table is used somewhat like a cache, I don't need the reads to always reflect the latest state of the table if it means an increase in performance. However, the table should remain durable even in the event of server crashes and hardware failures. Could this be achieved with a more relaxed transaction isolation level? </li>
<li>The architecture can definitely be optimized; <code>recording_id</code> can be a 4-byte integer, <code>end_filetime</code> can be changed to a passed value, and <code>start_filetime</code> ; may also be smaller. I'm afraid these changes will just postpone the problem for a while until the size of the table increases to compensate for the space saved. </li>
<li>Inserts into the table are always continuous
A SELECT performed on the table looks like this: </li>
</ul>
<pre class="brush:php;toolbar:false;">SELECT TRUE
FROM saved_segment
WHERE recording_id = ? AND `index` = ?</pre>
<pre class="brush:php;toolbar:false;">SELECT index, start_filetime, end_filetime, offset_and_size, storage_id
FROM saved_segment
WHERE recording_id = ? AND
start_filetime >= ? AND
start_filetime <= ?
ORDER BY `index` ASC</pre>
<p>The second type of query could certainly be improved with an index, but I'm concerned that this would further degrade INSERT performance. </p>
<p>Another thing I forgot to mention is that a table very similar to this exists. It queries and inserts exactly the same, but may cause further IO starvation. </p>
<p>Edit 2:
<code>SHOW TABLE STATUS</code> table <code>saved_segment</code>, and a very similar table <code>saved_screenshot</code> (which is in <code>bigint unsigned not There is an additional index on the null</code> column).</p>
<table class="s-table">
<thead>
<tr>
<th>Name</th>
<th>Engine</th>
<th>Version</th>
<th>Row format</th>
<th>OK</th>
<th>Average line length</th>
<th>Data length</th>
<th>Maximum data length</th>
<th>Index_length</th>
<th>No data</th>
<th>Auto-increment</th>
<th>Creation time</th>
<th>Updated</th>
<th>Check time</th>
<th>Organization</th>
<th>Checksum</th>
<th>Creation Options</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>Saved screenshot</td>
<td>InnoDB</td>
<td>10</td>
<td>News</td>
<td>483430208</td>
<td>61</td>
<td>29780606976</td>
<td>0</td>
<td>21380464640</td>
<td>6291456</td>
<td>Empty</td>
<td>“2021-10-21 01:03:21”</td>
<td>“2022-11-07 16:51:45”</td>
<td>Empty</td>
<td>utf8mb4_0900_ai_ci</td>
<td>Empty</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Saved segments</td>
<td>InnoDB</td>
<td>10</td>
<td>News</td>
<td>281861164</td>
<td>73</td>
<td>20802699264</td>
<td>0</td>
<td>0</td>
<td>4194304</td>
<td>Empty</td>
<td>“2022-11-02 09:03:05”</td>
<td>“2022-11-07 16:51:22”</td>
<td>Empty</td>
<td>utf8mb4_0900_ai_ci</td>
<td>Empty</td>
<td></td>
<td></td>
</tr>
</tbody>
</table></p>
I'll go out on a limb with this answer.
Assumption
innodb_buffer_pool_size
has a value slightly less than 20MB, andSystems have become I/O bound lately, as the "next" block required for the next Select is more and more often not cached in the buffer_pool.
The simple solution is to get more RAM and increase the setting of this tunable. But the table will only grow to the next limit you purchase.
Instead, here are some partial solutions.
INT UNSIGNED
(4 bytes instead of 8), or evenMEDIUMINT UNSIGNED
(3 bytes) . Note thatALTER TABLE
will lock the table for a long time.DATETIME
andTIMESTAMP
take 5 bytes (instead of 8 bytes).More
Yes, that is the case.
Putting this as
INDEX
, or better yet, as the beginning ofPRIMARY KEY
will give you the best help with both of your queries:reply:
If it is used to control some other SQL, consider adding it to the other SQL:
This query (in either form) requires content you already have
Your other inquiry needs
So, add index, or ...
Better...This combination is better for both
SELECT
:With this combination,