To cleanse data with multiple string columns, group by the first two columns and select the most common value for the third column in each combination.
The provided code fails with a KeyError, and grouping only by the City column results in an AssertionError. A robust solution is required.
Post pandas v0.16, pd.Series.mode offers a versatile and efficient method for this task:
source.groupby(['Country', 'City'])['Short name'].agg(pd.Series.mode)
In the case of multiple modes within a group, Series.mode returns a list of values. For a single result, apply a lambda function:
source.groupby(['Country', 'City'])['Short name'].agg(lambda x: pd.Series.mode(x)[0])
scipy.stats.mode can also be used, but it raises an error when encountering multiple modes.
The above is the detailed content of How to Find the Most Common Value in a Pandas DataFrame After Grouping?. For more information, please follow other related articles on the PHP Chinese website!