group-by - mysql group by 优化的问题
ringa_lee
ringa_lee 2017-04-17 16:31:14
0
1
630

我有一张接近八千万条数据的彩虹表,然后里面有二十多万条的重复数据,我想使用GROUP BY 进行分组,但是使用了GROUP BY语句以后,mysql会生出一张零时表,而零时表,把我的硬盘全部沾满了,所以,求个办法,能帮去去处这重复的数据?

我的sql:

sql:

SELECT COUNT(id) FROM password WHERE length = 4 GROUP BY ciphertext

ringa_lee
ringa_lee

ringa_lee

reply all(1)
迷茫

It is not recommended to use a SQL statement to solve this kind of problem, because the database cannot afford it. It can be implemented using a program (such as writing a Python script). The steps are as follows:

  1. Copy the table structure of table A (80 million rows) to generate an empty table B;

  2. Make a unique index on the ciphertext column of B;

  3. Traverse each row of table A and insert it into table B. You can use REPLACEINSERT INTO ... ON DUPLICATE KEY UPDATE ...

  4. Verify the data in table B. If the result is correct, delete table A and rename table B to table A.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template