This is the form:
| id | area | Variety | price |
|---|---|---|---|
| 1 | Alexander Valley | Cabernet Sauvignon | 35 |
| 2 | Alexander Valley | Cabernet Sauvignon | 45 |
| 3 | Alexander Valley | Merlot | 19 |
| 4 | California | Sauvignon Blanc | 8 |
| 5 | California | Pinot Noir | 17 |
I want to find out the cheapest and most expensive varieties in each region, so the output should be:
| area | expensive | Cheap |
|---|---|---|
| Alexander Valley | Cabernet Sauvignon | Merlot |
| California | Pinot Noir | Sauvignon Blanc |
I was able to get the correct results using both first_value()
SELECT DISTINCT region, FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive, FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price) AS cheapest FROM wine_listI think it is equivalent to the following query
SELECT DISTINCT region, FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive, LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS cheapest FROM wine_listBut now my output is:
| expensive | Cheap | |
|---|---|---|
| Cabernet Sauvignon | Cabernet Sauvignon | |
| Cabernet Sauvignon | Merlot | |
| Pinot Noir | Pinot Noir | |
| Pinot Noir | Sauvignon Blanc |
The default window for
FIRST_VALUEandLAST_VALUEisROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. ie. This is the first response. The last value is "so far".However, you want it to apply to the entire dataset, so you must explicitly describe the window range:
SELECT DISTINCT region, FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS expensive, LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cheapest FROM wine_list;