Index Optimization for Range Queries
Columns with higher cardinality contribute to more efficient indices in MySQL. However, in the case of range queries, an exception applies.
Problem Statement
Consider a table with the following structure:
CREATE TABLE `files` ( `did` int(10) UNSIGNED NOT NULL DEFAULT '0', `filename` VARBINARY(200) NOT NULL, `ext` VARBINARY(5) DEFAULT NULL, `fsize` DOUBLE DEFAULT NULL, `filetime` DATETIME DEFAULT NULL, PRIMARY KEY (`did`,`filename`), KEY `fe` (`filetime`,`ext`), -- Option 1 KEY `ef` (`ext`,`filetime`) -- Option 2 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Filetimes are distinct, while there are a limited number of ext values (i.e., higher cardinality for filetime, lower cardinality for ext). A query involves both columns with the following conditions:
WHERE ext = '...' AND filetime BETWEEN ... AND ...
Which index, fe or ef, is more optimal?
Answer
Surprisingly, the index with ext as the first column, despite its lower cardinality, is more efficient for this query.
Explanation
MySQL's optimizer analyzes index alternatives and chooses the one with the lowest cost. Using the optimizer trace, we can observe the reasoning behind this choice.
For fe (filetime first), MySQL estimates that it would need to scan 16684 rows to find 'gif' files, even with the range condition on filetime.
For ef (ext first), however, it estimates that it can use both index columns and quickly drill down to the appropriate rows, resulting in a cost of only 646.61. MySQL chooses this index as it can use more key parts, making the search more efficient.
Conclusions
The above is the detailed content of Which MySQL Index Is More Efficient for Range Queries: Leading with High or Low Cardinality Column?. For more information, please follow other related articles on the PHP Chinese website!