How to optimize a SQL query with over 2 million rows
P粉127901279
2023-09-01 18:09:31
<p>I have a Sql database with over 2 million rows and it is growing rapidly. There aren't many columns, just <code>code, price, date, and stationID</code>. </p>
<p>The purpose is to get the latest price by code and stationID.
The query works great but takes over 10 seconds. </p>
<p>Is there any way to optimize the query? </p>
<pre class="brush:php;toolbar:false;">$statement = $this->pdo->prepare(
'WITH cte AS
(
SELECT stationID AS ind, code, CAST(price AS DOUBLE ) AS price, date
,ROW_NUMBER() OVER(
PARTITION BY code, stationID
ORDER BY date DESC
) AS latest
FROM price
)
SELECT *
FROM cte
WHERE latest = 1
'
);
$statement->execute();
$results = $statement->fetchAll(PDO::FETCH_GROUP | PDO::FETCH_ASSOC);</pre>
<p>Edit:
The first column has an index called "id". I don't know if this helps. </p>
<p>The database (InnoDB) looks like this: </p>
<pre class="brush:php;toolbar:false;">id primary - int
stationID - int
code - int
price-decimal(10,5)
date - datetime</pre>
<p>Edit 2:</p>
<p>The results need to be grouped by stationID, and multiple rows need to be displayed for each stationID. One line for each code with the latest date.像这样:</p>
<pre class="brush:php;toolbar:false;">22456:
code: 1
price: 3
date: 2023-06-21
code: 2
price: 2
date: 2023-06-21
code: 3
price: 5
date: 2023-06-21
22457:
code: 1
price: 10
date: 2023-06-21
code: 2
price: 1
date: 2023-06-21
code: 3
price: 33
date: 2023-06-21</pre>
<p>The json output should be 像这样:</p>
<pre class="brush:php;toolbar:false;">{"1000001":[{"code":1,"price":1.661,"date":"2023-06-06 12:46:32","latest":1},{"code":2,"price":1.867,"date":"2023-06-06 12:46:32","latest":1},{"code":3,"price":1.05,"date":"2023-06-06 12:46:32","latest":1},{"code":5,"price":1.818,"date":"2023-06-06 12:46:32","latest":1},{"code":6,"price":1.879,"date":"2023-06-06 12:46:32","latest":1}],"1000002":[{"code":1,"price":1.65,"date":"2023-06-03 08:53:26","latest":1},{"code":2,"price":1.868,"date":"2023-06-03 08:53:26","latest":1},{"code":6,"price":1.889,"date":"2023-06-03 08:53:27","latest":1}],…</pre></p>
I guess you need the following indexes for the query to execute well (you only need to do it once as part of the database design).
The first two columns can be arranged in any order.
As long as there can't be two rows of the same
code, stationID
pair with the same datetime, using window functions is a bit like using a sledgehammer to crack a nut.It requires the following indexes:
This query should take less than 1 millisecond because the derived table can be built from the index and then it reads only the required rows from the table.
Alternatively, you can significantly reduce the effort if you know that each
code, stationID
pair will receive an updated price within a specific time period (1 hour, 1 day, 1 week) The window function needs to add a where clause: