Multiple Aggregations on the Same Column Using Pandas GroupBy.agg()
When working with Pandas, it's often necessary to perform multiple aggregations on the same column. While intuitive, the straightforward approach of specifying the same column multiple times in the agg() method is not syntactically correct. This begs the question of how to effectively and concisely apply different aggregating functions to a single column using GroupBy.agg().
Solution
As of 2022-06-20, the recommended practice for multiple aggregations is using a dictionary syntax:
df.groupby('dummy').agg({ 'returns': {'Mean': np.mean, 'Sum': np.sum} })
In this example, the 'returns' column is aggregated with both the mean and sum functions. The resulting DataFrame will contain two new columns, 'Mean' and 'Sum,' that show the respective aggregations.
Historical Note
Prior to the adoption of the dictionary syntax, there were two alternative methods for multiple aggregations:
df.groupby('dummy').agg({'returns': [np.mean, np.sum]})
This approach passes the functions as a list directly to agg(). The DataFrame will contain two new columns with the results of the mean and sum aggregations, respectively.
df.groupby('dummy').agg({'returns': {'f1': np.mean, 'f2': np.sum}})
Similar to the list approach, functions are passed as a dictionary within a dictionary. The keys of the inner dictionary specify the function names, while the values are the aggregating functions. The DataFrame will have a column for each specified function name.
The above is the detailed content of How to Perform Multiple Aggregations on a Single Column Using Pandas GroupBy.agg()?. For more information, please follow other related articles on the PHP Chinese website!