select * from user group by (...) having count(*) > 1 is enough, no need to use in. Actually, you really think too much about efficiency. This is a design error. Spending time to fix it once and then modifying the table structure to prevent such problems from happening again is the root cause.
In addition, this table does not have an ID, so you cannot automatically delete duplicate data because you cannot write the conditions, so you can only use an intermediate table to delete it.
To sum up, don’t consider efficiency, just correct it manually.
SELECT * FROM user AS a
WHERE EXISTS (
SELECT 1 FROM user AS b
WHERE a.name = b.name
GROUP BY name
HAVING COUNT(1) > 1
)
Since the entire table GROUP BY is involved, the performance is relatively low. If the table structure can be modified, it is recommended to add a field to indicate whether the user name has the same name. This field can be judged and maintained when writing data.
There is no magic in the world of databases. The reason why indexes can speed up searches is because they are pre-processed. There is a cost to speeding up searches by extending the time to insert and update data each time.
Back to this question, you can add an index on name so that group by can be more efficient, but you still need a subquery according to your needs.
But I don’t think adding an index to the name is a healthy approach. If it were me, I would do a scheduled task to solve this problem: scan users with the same name and record the ID in a certain table A. Then when it is displayed on the page, just take out table A and associate it with the user table. Another reason for doing this is that this work does not have to be very strict, and it is okay to have some deviations from reality.
Assume your table name is user:
select * from user group by (...) having count(*) > 1
is enough, no need to use in.Actually, you really think too much about efficiency. This is a design error. Spending time to fix it once and then modifying the table structure to prevent such problems from happening again is the root cause.
In addition, this table does not have an ID, so you cannot automatically delete duplicate data because you cannot write the conditions, so you can only use an intermediate table to delete it.
To sum up, don’t consider efficiency, just correct it manually.
Since the entire table
GROUP BY
is involved, the performance is relatively low.If the table structure can be modified, it is recommended to add a field to indicate whether the user name has the same name. This field can be judged and maintained when writing data.
There is no magic in the world of databases. The reason why indexes can speed up searches is because they are pre-processed. There is a cost to speeding up searches by extending the time to insert and update data each time.
Back to this question, you can add an index on name so that group by can be more efficient, but you still need a subquery according to your needs.
But I don’t think adding an index to the name is a healthy approach. If it were me, I would do a scheduled task to solve this problem: scan users with the same name and record the ID in a certain table A. Then when it is displayed on the page, just take out table A and associate it with the user table. Another reason for doing this is that this work does not have to be very strict, and it is okay to have some deviations from reality.