MySQL Performance Degradation with Subquery in WHERE IN
When attempting to identify and inspect duplicate rows in a MySQL database, a seemingly simple query resulted in unexpectedly slow performance. The initial query, intended to select all rows with duplicate values in the 'relevant_field' column, used the following structure:
SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1
This query executed quickly, but when a subsequent query was constructed to retrieve all rows in 'some_table' with 'relevant_field' values matching those in the first query, performance dropped dramatically.
SELECT * FROM some_table WHERE relevant_field IN ( SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1 )
The slow performance can be attributed to the nature of the subquery as a correlated query. In this case, the subquery is executed for each row processed by the main query, resulting in excessive overhead.
To mitigate this performance issue, it is advantageous to transform the correlated subquery into a non-correlated subquery by selecting all columns from the subquery.
SELECT * FROM ( SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1 ) AS subquery
By making this modification, the performance of the main query is significantly improved.
SELECT * FROM some_table WHERE relevant_field IN ( SELECT * FROM ( SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1 ) AS subquery )
This revised query leverages MySQL's optimization capabilities to efficiently retrieve duplicate rows without the performance penalty associated with the original correlated subquery.
The above is the detailed content of Why is My MySQL Query with a Subquery in WHERE IN So Slow?. For more information, please follow other related articles on the PHP Chinese website!