PostgreSQL SELECT DISTINCT
Performance Bottleneck and Optimization Strategies
A SELECT DISTINCT
query on a PostgreSQL table containing nearly two million records exhibits unexpectedly slow performance (500-600ms). The query planner inexplicably defaults to a sequential scan instead of leveraging an available index, and even index forcing doesn't significantly improve execution time.
Emulating Index Skip Scan in PostgreSQL
Because PostgreSQL lacks native index skip scan functionality, a workaround using a recursive common table expression (CTE) can mimic its behavior. This CTE iteratively retrieves distinct product IDs in ascending order, utilizing an index on product_id
for efficiency:
<code class="language-sql">WITH RECURSIVE cte AS ( ( SELECT product_id FROM tickers ORDER BY 1 LIMIT 1 ) UNION ALL SELECT l.* FROM cte c CROSS JOIN LATERAL ( SELECT product_id FROM tickers t WHERE t.product_id > c.product_id ORDER BY 1 LIMIT 1 ) l ) SELECT * FROM cte;</code>
This approach offers a substantial performance gain compared to a full table scan.
Alternative Approaches: DISTINCT
and DISTINCT ON
For tables with a more even distribution of rows per unique product ID, the standard DISTINCT
or DISTINCT ON
keywords might prove faster than the emulated index skip scan. Their performance depends heavily on data distribution.
Future Improvements: Native Index Skip Scan
PostgreSQL development includes ongoing work to integrate native index skip scan capabilities. This future enhancement promises further performance optimizations for SELECT DISTINCT
queries.
The above is the detailed content of Why is my PostgreSQL SELECT DISTINCT query so slow, and how can I improve its performance?. For more information, please follow other related articles on the PHP Chinese website!