I'm trying to write a MySQL query to count rows that are significantly different from the most common row for each product and report the percentage difference, where below 100% the price is lower than the average price for that product, and above 100% The price is higher. Prices less than 1 standard deviation from the mean are ignored.
Sample data:
_rowid | _Timestamp | Code | fk_product_id | fk_po_id | cost |
---|---|---|---|---|---|
5952 | 2021-01-10 10:19:01 | 00805 | 1367 | 543 | 0.850 |
9403 | 2022-05-23 14:54:34 | 00805 | 1367 | 2942 | 0.850 |
41595 | 2022-11-23 11:20:26 | 00805 | 1367 | 3391 | 1.350 |
39635 | 2022-01-18 12:49:32 | Water1 | 344 | 3153 | 0.140 |
40134 | 2022-04-06 22:39:34 | Water1 | 344 | 2747 | 0.190 |
41676 | 2022-12-09 16:28:28 | Water1 | 344 | 3398 | 0.140 |
39634 | 2022-01-18 12:49:31 | gr309203 344400 | 1024 | 3154 | 0.770 |
35634 | 2021-03-03 15:23:23 | gr309203 344400 | 1024 | 3203 | 0.790 |
41264 | 2022-11-16 11:41:44 | gr309203 344400 | 1024 | 3357 | 0.970 |
SELECT code, fk_product_id, cost, cost/ (SELECT avg(cost) FROM po_line aa WHERE aa.code = code) AS percent FROM po_line WHERE (SELECT STDDEV(cost) FROM po_line ss WHERE ss.code = code)>1;
This will not return any rows, but three rows (one for each product) should appear in the report.
The expected result should be:
Code | fk_product_id | cost | percentage |
---|---|---|---|
00805 | 1367 | 1.350 | 133 |
Water1 | 344 | 0.190 | 121 |
gr309203 344400 | 1024 | 0.970 | 115 |
This query shows how to use window functions to calculate the number of standard deviations and the percentage of cost relative to the average cost for each given code.
result:
(Please note that window functions require MySQL 8.0).
This query only shows you how the calculation is done. To get the results you want: