SQL query to select only rows with maximum value
P粉752479467
2023-08-22 10:03:28
<p>I have a document table (here is a simplified version): </p>
<table class="s-table">
<thead>
<tr>
<th>id</th>
<th>rev</th>
<th>content</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>...</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>...</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>...</td>
</tr>
<tr>
<td>1</td>
<td>3</td>
<td>...</td>
</tr>
</tbody>
</table>
<p>How to select one row per id and only select the largest rev? </p><p>
Based on the above data, the result should contain two rows: <code>[1, 3, ...]</code> and <code>[2, 1, ..]</code>. I'm using <strong><em>MySQL</em></strong>. </p>
<p>Currently, I'm using a check in a <code>while</code> loop to detect and overwrite old revs in the result set. But is this the only way to achieve results? Isn't there a solution for <strong>SQL</strong>? </p>
I prefer to use as little code as possible...
You can use
IN
to achieve Try this:In my opinion, this is simpler... easier to read and maintain.
At first glance...
You only need to use the
MAX
aggregate function in theGROUP BY
clause:Things are never that simple, are they?
I just noticed that you also need the
content
column.This is a very common problem in SQL: find the complete data corresponding to the maximum value in a column based on a certain grouping identifier. I've heard this question a lot in my career. In fact, I answered one of these questions during a technical interview at my current job.
This question is actually so common that the Stack Overflow community created a tag specifically to deal with this type of problem: greatest-n-per-group.
Basically, you have two ways to solve this problem:
Use simple
group-identifier, max-value-in-group
Subquery to connectIn this approach, you first find the
group-identifier, max-value-in-group
(already solved above) in a subquery. You then join your table with the subquery, usinggroup-identifier
andmax-value-in-group
for an equijoin:Use self-join for left join, adjust connection conditions and filtering conditions
In this approach, you left join the table to itself. Equijoin on
group-identifier
. Then, there are two clever steps:NULL
on the right (remember this is aLEFT JOIN
). We then filter the join results to only show rows withNULL
on the right.So, you end up with:
in conclusion
The results obtained by these two methods are exactly the same.
If you have two rows with the same
group-identifier
andmax-value-in-group
, both methods will include both rows in the result.Both methods are compatible with SQL ANSI, so no matter which RDBMS you are using, you can use both methods regardless of its "style".
Both methods are very efficient, but the specific effects may be different (RDBMS, database structure, index, etc.). So benchmark when choosing one of these methods. And make sure to choose the method that makes the most sense for you.