Retrieve the last record of each group using MySQL
P粉736935587
2023-08-20 11:48:53
<p>There is a table called <code>messages</code> that contains data like this: </p>
<pre class="brush:php;toolbar:false;">Id Name Other_Columns
--------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1</pre>
<p>If I run the query <code>select * from messages group by name</code>, I get the following results: </p>
<pre class="brush:php;toolbar:false;">1 A A_data_1
4 B B_data_1
6 C C_data_1</pre>
<p>Which query will return the following results? </p>
<pre class="brush:php;toolbar:false;">3 A A_data_3
5 B B_data_2
6 C C_data_1</pre>
<p>That is, the last record in each group should be returned. </p>
<p>Currently, this is the query I use: </p>
<pre class="brush:php;toolbar:false;">SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name</pre>
<p>But this seems inefficient. Are there any other ways to achieve the same result? </p>
UPD: 2017-03-31, MySQL version 5.7.5 has the ONLY_FULL_GROUP_BY switch enabled by default (therefore, non-deterministic GROUP BY queries are disabled). Additionally, they updated the GROUP BY implementation and the solution may no longer work as expected even with the switch disabled. An inspection is required.
Bill Karwin's solution works well when the number of items within the group is small, but the performance of the query becomes worse when the group is larger because the solution takes about
n*n/2 n/2timesIS NULLComparison.I tested on an InnoDB table containing
18684446rows and1182groups. This table contains test results for functional tests, and(test_id, request_id)is the primary key. Sotest_idis a group and I'm looking for the lastrequest_idfor eachtest_id.Bill's solution has been running on my Dell e4310 for a few hours now and I don't know when it will be complete, although it operates on a covering index (hence the
using indexshown in EXPLAIN ).I also have several solutions based on the same idea:
(group_id, item_value)pair in eachgroup_idis that of eachgroup_idThe last value, if we traverse the index in descending order, is the first value for eachgroup_id;3 ways MySQL uses indexes is a good article to learn some details.
Solution 1
This solution is very fast, taking about 0.8 seconds for my 18 million rows of data:
If you want to change the order to ascending order, put it into a subquery, return only the ID, and join it as a subquery with other columns:
SELECT test_id, request_id FROM ( SELECT test_id, MAX(request_id) AS request_id FROM testresults GROUP BY test_id DESC) as ids ORDER BY test_id;For my data, this solution takes about 1.2 seconds.
Solution 2
Here is another solution, for my table it takes about 19 seconds:
It also returns test results in descending order. It's slower because it does a full index scan, but it can give you an idea of how to output the N maximum rows for each group.
The disadvantage of this query is that its results cannot be cached by the query.
MySQL 8.0 now supports window functions, as are nearly all popular SQL implementations. Using this standard syntax, we can write a max-n-per-group query:
The MySQL manual demonstrates this method and other methods of finding the grouped largest row.
The following is the original answer I wrote for this question in 2009:
I wrote the solution like this:
Regarding performance, depending on the nature of the data, one of the solutions may be better. Therefore, you should test both queries and choose the better one based on the performance of your database.
For example, I have a copy of the StackOverflow August data dump. I will use it for benchmarking. There are 1,114,357 rows of data in the
Poststable. This is running MySQL 5.0.75 on my Macbook Pro 2.40GHz.I will write a query to find the latest posts for a given user ID (mine).
First used Eric's technique of using
GROUP BYin a subquery:SELECT p1.postid FROM Posts p1 INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid FROM Posts pi GROUP BY pi.owneruserid) p2 ON (p1.postid = p2.maxpostid) WHERE p1.owneruserid = 20860; 1行结果(1分17.89秒)Even
EXPLAINAnalysis takes more than 16 seconds:Now using
LEFT JOINUsing MY TECHNIQUEproduces the same query results:EXPLAINAnalysis shows that both tables can use their indexes:This is the DDL for my
Poststable:Commenter Note: If you want to run another benchmark using a different version of MySQL, a different data set, or a different table design, feel free to do it yourself. I've demonstrated the above technique. The purpose of Stack Overflow is to show you how to do software development work, not to do all the work for you.