Retrieve the First Row of Each Group in a Pandas DataFrame
Question:
How can you efficiently extract the first row of each group from a Pandas DataFrame, where the grouping is defined by multiple columns?
Answer:
To retrieve the first row of each group in a Pandas DataFrame based on multiple columns:
Group the Data: Group the DataFrame by the desired columns using the groupby() method:
df_grouped = df.groupby(['id', 'value'])
Apply an Aggregation Function: Apply the first() function to each group to obtain the first non-null element:
df_first_rows = df_grouped.first()
Reset the Index (Optional): If you need the 'id' and 'value' columns as separate columns, use the reset_index() method:
df_first_rows = df_first_rows.reset_index()
Example:
Consider the following DataFrame:
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6, 6, 7, 7], 'value': ["first", "second", "second", "first", "second", "first", "third", "fourth", "fifth", "second", "fifth", "first", "first", "second", "third", "fourth", "fifth"]})
Applying the上記の steps:
df_grouped = df.groupby(['id', 'value']) df_first_rows = df_grouped.first() df_first_rows = df_first_rows.reset_index() print(df_first_rows)
Output:
id value 0 1 first 1 2 first 2 3 first 3 4 second 4 5 first 5 6 first 6 7 fourth
This code successfully retrieves the first row of each group defined by the 'id' and 'value' columns.
The above is the detailed content of How to retrieve the first row of each group in a Pandas DataFrame based on multiple columns?. For more information, please follow other related articles on the PHP Chinese website!