Aggregating with Multiple Functions on the Same Column Using GroupBy
In Python's pandas library, the GroupBy.agg() function provides a convenient way to apply aggregation functions to grouped data. However, it's worth noting that applying multiple functions to the same column can be tricky.
Initially, it might seem intuitive to use the following syntax:
df.groupby("dummy").agg({"returns": f1, "returns": f2})
However, this approach fails due to duplicate keys being disallowed in Python. Instead, pandas offers several methods for performing such aggregations:
Method 1: List of Functions
Functions can be passed as a list:
df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
Method 2: Dictionary of Functions
Functions can be passed as a dictionary with keys representing the column name and values representing a list of functions:
df.groupby("dummy").agg({"returns": {"Mean": np.mean, "Sum": np.sum}})
Method 3: Recent Update (as of 2022-06-20)
In recent versions of pandas, the following syntax is preferred:
df.groupby('dummy').agg( Mean=('returns', np.mean), Sum=('returns', np.sum))
This syntax not only works seamlessly but also provides greater clarity and flexibility in specifying the aggregation functions and column names.
The above is the detailed content of How Can I Apply Multiple Aggregation Functions to the Same Column Using pandas GroupBy?. For more information, please follow other related articles on the PHP Chinese website!