Retrieve last record in each group - MySQL
P粉464088437
2023-08-24 15:06:23
<p>There is a table <code>messages</code> that contains data as shown below:</p>
<pre class="brush:php;toolbar:false;">Id Name Other_Columns
--------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1</pre>
<p>If I run a query <code>select * from messages group by name</code>, I will get the result as:</p>
<pre class="brush:php;toolbar:false;">1 A A_data_1
4 B B_data_1
6 C C_data_1</pre>
<p>What query would return the following results? </p>
<pre class="brush:php;toolbar:false;">3 A A_data_3
5 B B_data_2
6 C C_data_1</pre>
<p>That is, the last record in each group should be returned. </p>
<p>Currently, this is the query I use: </p>
<pre class="brush:php;toolbar:false;">SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name</pre>
<p>But this seems inefficient. Are there any other ways to achieve the same result? </p>
UPD: 2017-03-31, Version 5.7.5 MySQL enables the ONLY_FULL_GROUP_BY switch by default (so non-deterministic GROUP BY queries are disabled). Additionally, they updated the GROUP BY implementation and the solution may not work as expected even with the switch disabled. Need to check it out.
Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about
n*n/2 n/2
of onlyIS NULL
comparisons.I made my tests on a InnoDB table of
18684446
rows with1182
groups. The table contains testresults for functional tests and has the(test_id, request_id)
as the primary key. Thus,test_id
is a group and I was searching for the lastrequest_id
for eachtest_id
.Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence
using index
in EXPLAIN).I have a couple of other solutions based on the same idea:
(group_id, item_value)
pair is the last value within eachgroup_id
, that is the first for eachgroup_id
if we walk through the index in descending order;3 Ways MySQL Uses Indexes is a great article to help you understand some of the details.
Solution 1
This is incredibly fast, taking about 0.8 seconds on my 18M rows:
If you want to change the order to ASC, put it in a subquery that returns only the ids and use it as a subquery to join the rest of the columns:
This takes about 1.2 seconds for my data.
Solution 2
Here's another solution that took about 19 seconds for my table:
It also returns tests in descending order. It's much slower because it performs a full index scan, but it gives you an idea of how to output the N maximum rows for each group.
The disadvantage of this query is that the query cache cannot cache its results.
MySQL 8.0 now supports window functions, such as almost all popular SQL implementations. Using this standard syntax, we can write up to n queries per group:
This method and other methods of finding the maximum number of rows grouped are described in the MySQL manual.
The following is the original answer I wrote to this question in 2009:
I wrote the solution like this:
Regarding performance, one solution may be better depending on the nature of the data. Therefore, you should test both queries and use the one with better performance based on your database.
For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the
Posts
table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.I will write a query to find the latest posts for a given user ID (mine).
First using the technique shown by @Eric with the
GROUP BY
in a subquery:Even the
EXPLAIN
analysis takes over 16 seconds:Now produce the same query result using my technique with
LEFT JOIN
:The
EXPLAIN
analysis shows that both tables are able to use their indexes:Here's the DDL for my
Posts
table:Note to commenters: If you want to run another benchmark using a different version of MySQL, a different dataset, or a different table design, feel free to do it yourself. I've demonstrated the technique above. Stack Overflow is here to show you how to do software development work, not to do all the work for you.