How to Sum DataFrame Rows for Specific Columns in Pandas
While working with DataFrames, there may be instances where we need to add a new column that represents the sum of values from multiple existing columns. In this question, the user encounters an issue while attempting to create a new column 'e' that sums the values from columns 'a', 'b', and 'd' in a DataFrame.
The user's initial approach using df[['a', 'b', 'd']].map(sum) was unsuccessful. To correctly sum rows for specific columns in Pandas, we can use the sum() function with the axis parameter set to 1. This signifies that we want to sum the rows rather than the columns.
df['e'] = df.sum(axis=1, numeric_only=True)
In Pandas versions prior to 2.0, we can simply specify axis=1 without numeric_only=True. However, in later versions, non-numeric columns are ignored when numeric_only=True is specified.
If the goal is to sum specific columns, we can create a list of the desired columns and use sum() with axis=1 to calculate the row sums for that subset of columns.
col_list = list(df) col_list.remove('d') df['e'] = df[col_list].sum(axis=1)
By following these steps, we can successfully add a new column 'e' that contains the row sums for any combination of numeric columns in a DataFrame.
The above is the detailed content of How to Sum Rows for Specific Columns in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!