tea-tasting：用於 A/B 測試統計分析的 Python 套件-Python教學-PHP中文網

tea-tasting: a Python package for the statistical analysis of A/B tests

簡介

我開發了tea-tasting，一個用於 A/B 測試統計分析的 Python 包，具有：

學生的 t 檢定、Bootstrap、CUPED 變異數縮減、效能分析以及其他開箱即用的統計方法和方法。
支援廣泛的資料後端，例如 BigQuery、ClickHouse、PostgreSQL/GreenPlum、Snowflake、Spark、Pandas 以及 Ibis 支援的 20 多個其他後端。
可擴展的 API：定義自訂指標並使用您選擇的統計測試。
用於減少手動工作的便利 API，以及最小化錯誤的框架。
詳細文檔。

在這篇文章中，我探討了在實驗分析中使用品茶的每一個優點。

如果你想嘗試一下，請查看文件。

統計方法

品茶包括統計方法和技術，涵蓋了您在實驗分析中可能需要的大部分內容。

使用學生 t 檢定和 Z 檢定分析指標平均值和比例。或使用 Bootstrap 來分析您選擇的任何其他統計數據。並且有一種使用 Bootstrap 分析分位數的預定義方法。品茶還可以檢測 A/B 測試不同變異的樣本比例不符。

品茶採用Delta方法來分析平均值的比率。例如，每平均會話數的平均訂單數，假設會話不是隨機化單位。

使用實驗前數據、指標預測或其他協變量來減少變異數並提高實驗的靈敏度。這種方法也稱為 CUPED 或 CUPAC。

學生 t 檢定和 Z 檢定中百分比變化的置信區間的計算可能很棘手。只需取絕對變化的置信區間並將其除以控制平均值就會產生偏差的結果。品茶採用delta法來計算正確的時間間隔。

分析學生 t 檢定和 Z 檢定的統計功效。有以下三種可能的選擇：

在給定統計功效和觀察總數的情況下計算效應大小。
給定統計功效和效應大小，計算觀察總數。
根據效應大小和觀察總數計算統計功效。

在詳細的使用者指南中了解更多。

路線圖包括：

多重假設檢定：
- 家庭錯誤率：Holm–Bonferroni 方法。
- 錯誤發現率：Benjamini–Hochberg 程序。
A/A 測試和模擬，用於分析任何統計測試的功效。
更多統計檢定：
- 頻率數據的漸近和精確測試。
- 曼-惠特尼 U 測試。
順序檢定：mSPRT 的 p 值始終有效。

您可以使用您選擇的統計測試來定義自訂指標。

數據後端

有許多不同的資料庫和引擎用於儲存和處理實驗數據。而且在大多數情況下，將詳細的實驗數據拉入 Python 環境的效率並不高。許多統計檢驗，例如學生 t 檢定或 Z 檢驗，僅需要匯總資料進行分析。

例如，如果原始實驗數據儲存在 ClickHouse 中，那麼直接在 ClickHouse 中計算計數、平均值、方差和協方差比在 Python 環境中獲取粒度數據並執行聚合更快、更有效率。

手動查詢所有必需的統計資料可能是一項艱鉅且容易出錯的任務。例如，使用 CUPED 分析比率指標和變異數減少不僅需要行數和方差，還需要協方差。不過別擔心－品茶這一切對你有用嗎。

品茶接受 Pandas DataFrame 或 Ibis Table 形式的資料。 Ibis 是一個 Python 包，用作各種資料後端的 DataFrame API。它支援 20 多個後端，包括 BigQuery、ClickHouse、PostgreSQL/GreenPlum、Snowflake 和 Spark。您可以編寫 SQL 查詢，將其包裝為 Ibis 表，然後將其傳遞給茶品嚐.

請記住，品茶假設：

Data is grouped by randomization units, such as individual users.
There is a column indicating variant of the A/B test (typically labeled as A, B, etc.).
All necessary columns for metric calculations (like the number of orders, revenue, etc.) are included in the table.

Some statistical methods, like Bootstrap, require granular data for the analysis. In this case,tea-tastingfetches the detailed data as well.

Learn more in the guide on data backends.

Convenient API

You can perform all the tasks listed above using just NumPy, SciPy, and Ibis. In fact,tea-tastinguses these packages under the hood. Whattea-tastingoffers on top is a convenient higher-level API.

It's easier to show than to describe. Here is the basic example:

import tea_tasting as tt data = tt.make_users_data(seed=42) experiment = tt.Experiment( sessions_per_user=tt.Mean("sessions"), orders_per_session=tt.RatioOfMeans("orders", "sessions"), orders_per_user=tt.Mean("orders"), revenue_per_user=tt.Mean("revenue"), ) result = experiment.analyze(data) print(result) #> metric control treatment rel_effect_size rel_effect_size_ci pvalue #> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674 #> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762 #> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118 #> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123

登入後複製

The two-stage approach, with separate parametrization and inference, is common in statistical modeling. This separation helps in making the code more modular and easier to understand.

tea-tastingperforms calculations that can be tricky and error-prone:

Analysis of ratio metrics with delta method.
Variance reduction with CUPED/CUPAC (also in combination with the delta method for ratio metrics).
Calculation of confidence intervals for both absolute and percentage change.
Analysis of statistical power.

It also provides a framework for representing experimental data to avoid errors. Grouping the data by randomization units and including all units in the dataset is important for correct analysis.

In addition,tea-tastingprovides some convenience methods and functions, such as pretty formatting of the result and a context manager for metric parameters.

Documentation

Last but not least: documentation. I believe that good documentation is crucial for tool adoption. That's why I wrote several user guides and an API reference.

I recommend starting with the example of basic usage in the user guide. Then you can explore specific topics, such as variance reduction or power analysis, in the same guide.

See the guide on data backends to learn how to use a data backend of your choice withtea-tasting.

See the guide on custom metrics if you want to perform statistical test that is not included intea-tasting.

Use the API reference to explore all parameters and detailed information about the functions, classes, and methods available intea-tasting.

Conclusions

There are a variety of statistical methods that can be applied in the analysis of an experiment. But only a handful of them are actually used in most cases.

On the other hand, there are methods specific to the analysis of A/B tests that are not included in the general purpose statistical packages like SciPy.

tea-tastingfunctionality includes the most important statistical tests, as well as methods specific to the analysis of A/B tests.

tea-tastingprovides a convenient API that helps to reduce the time spent on analysis and minimize the probability of error.

In addition,tea-tastingoptimizes computational efficiency by calculating the statistics in the data backend of your choice, where the data are stored.

With the detailed documentation, you can quickly learn how to usetea-tastingfor the analysis of your experiments.

P.S. Package name

The package name "tea-tasting" is a play on words that refers to two subjects:

Lady tasting tea is a famous experiment which was devised by Ronald Fisher. In this experiment, Fisher developed the null hypothesis significance testing framework to analyze a lady's claim that she could discern whether the tea or the milk was added first to the cup.
"tea-tasting" phonetically resembles "t-testing" or Student's t-test, a statistical test developed by William Gosset.

以上是tea-tasting：用於 A/B 測試統計分析的 Python 套件的詳細內容。更多資訊請關注PHP中文網其他相關文章！