mysql - 如何在数据库里优化 汉明距离 查询?
PHP中文网
PHP中文网 2017-04-17 14:54:25
0
2
1796

项目里需要一个搜索相似图片的功能,百度了一些dhash的算法,生成了16个长度的hash值,在mysql里这样查询:

 SELECT pk, hash, 
 BIT_COUNT(   CONV(hash, 16, 10) ^ CONV('4c8e3366c275650f', 16, 10) ) as hamming_distance   
 FROM image_hashes   HAVING hamming_distance < 4   ORDER BY hamming_distance ASC; 

经测试,15W条数据,搜索需要很长时间。第二次速度会快些,有啥办法可以优化?

PHP中文网
PHP中文网

认证0级讲师

reply all(2)
巴扎黑

Thank you for the invitation.

Sorry to say, I have never done this, but I have heard of Hamming distance before. 15k data, not a lot, but using the mysql function, there is no way to create an index. . .

I probably searched and found the same problem.

  1. You can refer to mysql picture Hamming distance calculation, nearly 40w XOR calculation, how to break

  2. Hamming distance on binary strings in SQL

  3. You can try mysql’s memory table. Introduction to the features and use of MySQL memory table

大家讲道理

Sorry, I have no experience with "Hamming distance".

If there is only 15W data, you can load the data into memory and cache it, and then perform calculations in the program. After all, SQL is not good at doing this kind of thing.

Based on your SQL, there really isn’t much room for optimization. The only things I can think of are the following:

    Use
  • in the

    hashchar column instead of varchar

  • Discard ORDER BY hamming_distance ASC and sort by program instead

It is recommended to send the DDL statement and EXPLAIN results for analysis together, but the biggest performance checkpoint should be BIT_COUNT(CONV(hash, 16, 10) ^ CONV('4c8e3366c275650f', 16, 10)) as hamming_distance, which is actually equivalent to a full table scan.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template