
PostgreSQL SELECT DISTINCT Performance Bottleneck and Optimization Strategies
A SELECT DISTINCT query on a PostgreSQL table containing nearly two million records exhibits unexpectedly slow performance (500-600ms). The query planner inexplicably defaults to a sequential scan instead of leveraging an available index, and even index forcing doesn't significantly improve execution time.
Emulating Index Skip Scan in PostgreSQL
Because PostgreSQL lacks native index skip scan functionality, a workaround using a recursive common table expression (CTE) can mimic its behavior. This CTE iteratively retrieves distinct product IDs in ascending order, utilizing an index on product_id for efficiency:
<code class="language-sql">WITH RECURSIVE cte AS (
(
SELECT product_id
FROM tickers
ORDER BY 1
LIMIT 1
)
UNION ALL
SELECT l.*
FROM cte c
CROSS JOIN LATERAL (
SELECT product_id
FROM tickers t
WHERE t.product_id > c.product_id
ORDER BY 1
LIMIT 1
) l
)
SELECT * FROM cte;</code>This approach offers a substantial performance gain compared to a full table scan.
Alternative Approaches: DISTINCT and DISTINCT ON
For tables with a more even distribution of rows per unique product ID, the standard DISTINCT or DISTINCT ON keywords might prove faster than the emulated index skip scan. Their performance depends heavily on data distribution.
Future Improvements: Native Index Skip Scan
PostgreSQL development includes ongoing work to integrate native index skip scan capabilities. This future enhancement promises further performance optimizations for SELECT DISTINCT queries.
The above is the detailed content of Why is my PostgreSQL SELECT DISTINCT query so slow, and how can I improve its performance?. For more information, please follow other related articles on the PHP Chinese website!