查询重复数据
select * from TABLE_NAME where (movie_id,star_id) in (select movie_id,star_id from TABLE_NAME group by movie_id,star_id having count() > 1) and id not in (select min(id) from TABLE_NAME group by movie_id,star_id having count()>1)
删除重复数据
delete from TABLE_NAME where (movie_id,star_id) in (select movie_id,star_id from TABLE_NAME group by movie_id,star_id having count() > 1) and id not in (select min(id) from TABLE_NAME group by movie_id,star_id having count()>1)
保留id最小的记录
查询重复数据
select * from TABLE_NAME where (movie_id,star_id) in (select movie_id,star_id from TABLE_NAME group by movie_id,star_id having count() > 1) and id not in (select min(id) from TABLE_NAME group by movie_id,star_id having count()>1)
删除重复数据
delete from TABLE_NAME where (movie_id,star_id) in (select movie_id,star_id from TABLE_NAME group by movie_id,star_id having count() > 1) and id not in (select min(id) from TABLE_NAME group by movie_id,star_id having count()>1)
我是有一个和你差不多的,但是我高并发,一分钟3000条数据入库。
我是先将数据放memcahe里面,
每个入库的先匹配,
如果没有,就入库,
如果有的,比较数据时间戳,再做处理。。。。
SELECT distinct movie_id, star_id FROM xxx
把xxx换成你的表名,这里的id是主键id,如果你表里没有那就换成time,但是这样可能会造成time也相同的重复数据无法被找到。