NaN Imputation with Group Mean in Pandas
Filling missing values using the mean within each group is a common task when working with tabular data. Consider the following DataFrame with missing values:
df = pd.DataFrame({'value': [1, np.nan, np.nan, 2, 3, 1, 3, np.nan, 3], 'name': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C']})
Our goal is to impute the missing values with the mean of each group based on the 'name' column.
To achieve this, we can utilize the groupby() and transform() functions in Pandas:
grouped = df.groupby('name').mean() df["value"] = df.groupby("name").transform(lambda x: x.fillna(x.mean()))
The groupby() function creates groups based on the 'name' column, and mean() calculates the mean value for each group. The transform() function applies this mean value to each row within each group and fills in the missing values.
The resulting DataFrame:
print(df) name value 0 A 1 1 A 1 2 B 2 3 B 2 4 B 3 5 B 1 6 C 3 7 C 3 8 C 3
Explanation:
Alternative Solution:
Another approach to group-based missing value imputation is:
impute_cols = ['value'] df[impute_cols] = df[impute_cols].fillna(df.groupby('name')[impute_cols].transform('mean'))
Both methods achieve the same result, but the latter approach provides more flexibility when imputing multiple columns.
The above is the detailed content of How to Impute Missing Values in Pandas Using Group Means?. For more information, please follow other related articles on the PHP Chinese website!