Replacing NaN Values in pandas DataFrame with Column Averages
Filling NaN values in a pandas DataFrame with the average of corresponding columns is a common task in data analysis. While numpy offers a straightforward approach for arrays, pandas DataFrames require a tailored solution.
Approach:
To replace NaN values in a DataFrame with column averages, we can use the DataFrame.fillna method:
<code class="python">df.fillna(df.mean())</code>
Example:
Consider a DataFrame with NaN values:
<code class="python">import pandas as pd df = pd.DataFrame({ 'A': [-0.166919, -0.297953, -0.120211, np.nan, np.nan, -0.788073, -0.916080, -0.887858, 1.948430, 0.019698], 'B': [0.979728, -0.912674, -0.540679, -2.027325, np.nan, np.nan, -0.612343, 1.033826, 1.025011, -0.795876], 'C': [-0.632955, -1.365463, -0.680481, 1.533582, 0.461821, np.nan, np.nan, np.nan, -2.982224, -0.046431] })</code>
Calculating the mean of each column:
<code class="python">column_averages = df.mean()</code>
And finally, replacing the NaN values:
<code class="python">df_filled = df.fillna(column_averages)</code>
Result:
<code class="python">print(df_filled) A B C 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 -0.151121 -2.027325 1.533582 4 -0.151121 -0.231291 0.461821 5 -0.788073 -0.231291 -0.530307 6 -0.916080 -0.612343 -0.530307 7 -0.887858 1.033826 -0.530307 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431</code>
As seen in the output, the NaN values are successfully replaced with the average of their respective columns.
The above is the detailed content of How do you replace NaN values in a pandas DataFrame with column averages?. For more information, please follow other related articles on the PHP Chinese website!