Percentage of Total with Groupby in Pandas
Calculating the percentage of sales contributed by each office within a state requires a nuanced approach in Pandas. While simply grouping by 'state' and 'office_id' gives you the sum of sales for each office, it doesn't provide the percentage contribution within the state.
To achieve this, you need to first create a groupby object based on 'state' and 'office_id'. This will give you a dataframe with the sales column containing the total sales for each office-state combination:
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
To calculate the percentage, you can divide each office's sales by the total sales for that state. However, to access the total sales for each state within the groupby, you need to create a second groupby object based on 'state':
state_total = df.groupby('state').agg({'sales': 'sum'})
Using this object, you can enhance the 'state_office' groupby with a new column containing the percentage of sales for each office-state combination:
state_pcts = state_office.groupby(level=0).apply(lambda x: 100 * x / float(state_total.loc[x.name]))
Note that the 'level=0' parameter in 'groupby' refers to the top level of the multi-level index formed by the original groupby on 'state' and 'office_id'.
This approach ensures that each office's sales percentage is calculated by referencing the total sales within the respective state.
The above is the detailed content of How to Calculate the Percentage of Sales per Office within Each State using Pandas?. For more information, please follow other related articles on the PHP Chinese website!