Imputing Missing Values with Group Mean in Pandas DataFrames
In data manipulation tasks, it's common to encounter missing values denoted as NaN. To address this issue, one approach is to fill in these missing values with the mean value computed within specific groups.
Consider the example dataframe:
name | value |
---|---|
A | 1 |
A | NaN |
B | NaN |
B | 2 |
B | 3 |
B | 1 |
C | 3 |
C | NaN |
C | 3 |
Our goal is to replace the NaN values with the corresponding group mean of 'value'. To achieve this, we can leverage the transform() method:
mean_values = df.groupby('name').transform(lambda x: x.fillna(x.mean())) df["value"] = mean_values
After execution, the dataframe is updated:
name | value |
---|---|
A | 1 |
A | 1 |
B | 2 |
B | 2 |
B | 3 |
B | 1 |
C | 3 |
C | 3 |
C | 3 |
Each NaN value has been substituted with its respective group mean, preserving the integrity of the data for further analysis.
The above is the detailed content of How Can I Impute Missing Values in Pandas DataFrames Using Group Means?. For more information, please follow other related articles on the PHP Chinese website!