Indexing with Higher Cardinality Columns First When Involving a Range
Consider the following table:
CREATE TABLE `files` ( `did` int(10) unsigned NOT NULL DEFAULT '0', `filename` varbinary(200) NOT NULL, `ext` varbinary(5) DEFAULT NULL, `fsize` double DEFAULT NULL, `filetime` datetime DEFAULT NULL, PRIMARY KEY (`did`,`filename`), KEY `fe` (`filetime`,`ext`), -- This? KEY `ef` (`ext`,`filetime`) -- or This? ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
With a million rows and a high cardinality for filetime and a lower cardinality for ext, the question arises as to which index is more advantageous: fe or ef.
Analysis with Force Index and EXPLAIN
Using FORCE INDEX to test both indexes reveals a clear difference in performance:
-- Forcing the range on filetime first mysql> EXPLAIN SELECT COUNT(*), AVG(fsize) FROM files FORCE INDEX(fe) WHERE ext = 'gif' AND filetime >= '2015-01-01' AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
-- Forcing the low-cardinality ext first mysql> EXPLAIN SELECT COUNT(*), AVG(fsize) FROM files FORCE INDEX(ef) WHERE ext = 'gif' AND filetime >= '2015-01-01' AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
The EXPLAIN output indicates that ef is significantly faster, using less rows to retrieve the results.
Analysis with the Optimizer Trace
The Optimizer trace confirms the superiority of ef:
"potential_range_indices": [ ... { "index": "fe", "usable": true, ... }, { "index": "ef", "usable": true, ... } ], "analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "fe", "ranges": [ "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00" ], "cost": 20022, -- Higher cost }, { "index": "ef", "ranges": [ "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00" ], "cost": 646.61, -- Lower cost } ], }
The trace reveals that ef can use both columns of the index, resulting in a more efficient search. Additionally, it highlights that the Optimizer will only examine the first 'range' column, making the cardinality of ext irrelevant.
Conclusions
Based on the analysis, it is clear that when dealing with a range query involving multiple indexed columns, the order of the columns should be:
This approach ensures that the index is used most effectively, resulting in optimal query performance.
The above is the detailed content of Should Higher Cardinality Columns Come First in Multi-Column Indexes with Range Queries?. For more information, please follow other related articles on the PHP Chinese website!