High Cardinality Column Placement in Composite Indexes with Range Queries
When querying a table with a composite index involving a range condition, the placement of columns within the index can significantly impact performance.
Consider the table files with a primary key (did, filename) and two composite indexes: INDEX(filetime, ext) and INDEX(ext, filetime). Both indexes contain the filetime column, which has higher cardinality than ext.
The query:
WHERE ext = '...' AND filetime BETWEEN ... AND ...
requires accessing data based on both ext and filetime. The question arises: which index is optimal for such a query?
Analysis
To determine the optimal index, we can use FORCE INDEX and examine the execution plans:
-- Force range on filetime first FORCE INDEX(fe) SELECT COUNT(*), AVG(fsize) FROM files WHERE ext = 'gif' AND filetime >= '2015-01-01' AND filetime < '2015-01-01' + INTERVAL 1 MONTH; -- Force low-cardinality ext first FORCE INDEX(ef) SELECT COUNT(*), AVG(fsize) FROM files WHERE ext = 'gif' AND filetime >= '2015-01-01' AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
The output shows that INDEX(ext, filetime) (ef) has a significantly lower row count, indicating a more efficient scan.
Optimizer Trace
To further analyze the optimizer's behavior, we can use the optimizer trace:
SELECT explain_format = 'JSON'; SELECT COUNT(*), AVG(fsize) FROM files WHERE ext = 'gif' AND filetime >= '2015-01-01' AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
The trace reveals that the optimizer chooses INDEX(ext, filetime) because it can use both columns of the index to filter and fetch data. In contrast, INDEX(filetime, ext) can only use the first column (filetime) for filtering.
Conclusions
Based on the analysis, the following conclusions can be drawn:
The above is the detailed content of Which Composite Index is Optimal for Range Queries: High vs. Low Cardinality Columns?. For more information, please follow other related articles on the PHP Chinese website!