mysql - 如何在数据库里优化汉明距离查询？

Question

项目里需要一个搜索相似图片的功能，百度了一些dhash的算法，生成了16个长度的hash值，在mysql里这样查询： {代码...} 经测试，15W条数据，搜索需要很长时间。第二次速度会快些，有啥办法可以优化？

巴扎黑 · Answer

Thank you for the invitation.

Sorry to say, I have never done this, but I have heard of Hamming distance before. 15k data, not a lot, but using the mysql function, there is no way to create an index. . .

I probably searched and found the same problem.

You can refer to mysql picture Hamming distance calculation, nearly 40w XOR calculation, how to break
Hamming distance on binary strings in SQL
You can try mysql’s memory table. Introduction to the features and use of MySQL memory table

大家讲道理 · Answer

Sorry, I have no experience with "Hamming distance".

If there is only 15W data, you can load the data into memory and cache it, and then perform calculations in the program. After all, SQL is not good at doing this kind of thing.

Based on your SQL, there really isn’t much room for optimization. The only things I can think of are the following:

in the
hashchar column instead of varchar
Discard ORDER BY hamming_distance ASC and sort by program instead

It is recommended to send the DDL statement and EXPLAIN results for analysis together, but the biggest performance checkpoint should be BIT_COUNT(CONV(hash, 16, 10) ^ CONV('4c8e3366c275650f', 16, 10)) as hamming_distance, which is actually equivalent to a full table scan.