With Pandas, you can perform various aggregation operations to reduce the dimensionality and summarize data.
Pandas provides many aggregating functions, including mean(), sum(), count(), min(), and max(). You can use these functions to calculate summary statistics for each group. For example:
# Calculate mean of each group based on 'A' and 'B' columns df1 = df.groupby(['A', 'B']).mean() # Print the results print(df1)
When you apply aggregation to multiple columns, the resulting object can be a Series or DataFrame depending on the number of columns grouped.
To get a DataFrame with all the columns, use as_index=False in the groupby function.
To aggregate strings columns, you can use list, tuple, or join operations.
For example:
# Convert 'B' column values to a list for each group df1 = df.groupby('A')['B'].agg(list).reset_index() # Combine 'B' column values into a string with separator for each group df2 = df.groupby('A')['B'].agg(','.join).reset_index()
To count non-missing values in each group, use GroupBy.count(). To count all values, including missing ones, use GroupBy.size().
For example:
# Count non-missing values in 'C' column for each group df1 = df.groupby('A')['C'].count().reset_index(name='COUNT') # Count all values in 'A' column for each group df2 = df.groupby('A').size().reset_index(name='COUNT')
You can add a new column containing the aggregated values using the transform() method. The transform() function applies the specified operation to each group and returns a new object with the same size as the original one.
For example:
# Create a new 'C1' column with the sum of 'C' grouped by 'A' df['C1'] = df.groupby('A')['C'].transform('sum')
The above is the detailed content of How to Perform Data Aggregation with Pandas?. For more information, please follow other related articles on the PHP Chinese website!