How to improve grouping speed
P粉155832941
P粉155832941 2023-09-05 10:24:40
0
1
421

I have a large table with over 3B rows and my query now takes 3 days to execute.

This is my query:

insert into frequencywords (word, frequency, filename) select word, count(*) as frequency, max(filename) from allwordstemp group by word

Basically, my query is to group by word from allwordstemp table, and I want to know the filename when frequency = 1, that's why I added max(filename), Because filename needs to be included in the aggregate function, such as max. If frequency > 1, I don't need the value of filename either. Both tables have 2 indexes on word and filename.

allwordstemp table (filename is an id file):

CREATE TABLE `allwordstemp` ( `word` varchar(45) DEFAULT NULL, `filename` int(11) DEFAULT NULL, KEY `idx_allwordstemp_word` (`word`), KEY `idx_allwordstemp_filename` (`filename`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

frequencywords table:

CREATE TABLE `frequencywords` ( `word` varchar(45) DEFAULT NULL, `frequency` int(11) DEFAULT NULL, `filename` int(11) DEFAULT NULL, KEY `idx_frequencywords_word` (`word`), KEY `idx_frequencywords_frequency` (`frequency`), KEY `idx_frequencywords_filename` (`filename`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Explain selection:

 ---- ------------- -------------- -- ---------- ------- --------------- ------------------ ----- --------- ------ ------------ ---------- -------- ----- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | ---- ------------- -------------- ------------ ------- --------------- ----------------------- ---------- --- ------------------------------------- | 1 | SIMPLE | allwordstemp | NULL | index | NULL | idx_allwordstemp_word | 138 | NULL | 3487864881 | 100.00 | Using index | ---- ------------- -------------- ------------ ------- --------------- ----------------------- ---------- --- --- ---------------- ---------- ------------- 

How can I make my query faster?

P粉155832941
P粉155832941

reply all (1)
P粉741678385

Your filter appears to be based on frequency, not words or filenames. So I would first index all the words by frequency.

Then, assuming frequency is an integer, I would add a WHERE clause like this:

select word, max(filename) from allwordstemp where frequency = 1 group by word

The above will give you a list of all words that appear only once in various file names.

Hope this helps, and all the best!

    Latest Downloads
    More>
    Web Effects
    Website Source Code
    Website Materials
    Front End Template
    About us Disclaimer Sitemap
    php.cn:Public welfare online PHP training,Help PHP learners grow quickly!