Creating a New Column Based on Values from Multiple Columns in Pandas
Problem:
The objective is to apply a custom function that determines the race category for each row of a dataframe based on the values in several ethnicity columns. The priority order for the race categories is:
Custom Function:
To determine the race category for each row, we create a custom function:
def label_race(row): if row['eri_hispanic'] == 1: return 'Hispanic' if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1: return 'Two Or More' if row['eri_nat_amer'] == 1: return 'A/I AK Native' if row['eri_asian'] == 1: return 'Asian' if row['eri_afr_amer'] == 1: return 'Black/AA' if row['eri_hawaiian'] == 1: return 'Haw/Pac Isl.' if row['eri_white'] == 1: return 'White' return 'Other'
Applying the Function to the Dataframe:
We use the apply function in Pandas to apply the custom function to each row of the dataframe:
df['race_label'] = df.apply(label_race, axis=1)
The axis=1 argument specifies that the function should be applied row-wise.
Result:
The new column race_label will contain the calculated race category for each row in the dataframe.
The above is the detailed content of How to Create a Race Category Column in Pandas Using Multiple Ethnicity Columns?. For more information, please follow other related articles on the PHP Chinese website!