Home > Backend Development > Python Tutorial > How to retrieve the first row of each group in a Pandas DataFrame based on multiple columns?

How to retrieve the first row of each group in a Pandas DataFrame based on multiple columns?

DDD
Release: 2024-11-17 09:59:03
Original
791 people have browsed it

How to retrieve the first row of each group in a Pandas DataFrame based on multiple columns?

Retrieve the First Row of Each Group in a Pandas DataFrame

Question:

How can you efficiently extract the first row of each group from a Pandas DataFrame, where the grouping is defined by multiple columns?

Answer:

To retrieve the first row of each group in a Pandas DataFrame based on multiple columns:

  1. Group the Data: Group the DataFrame by the desired columns using the groupby() method:

    df_grouped = df.groupby(['id', 'value'])
    Copy after login
  2. Apply an Aggregation Function: Apply the first() function to each group to obtain the first non-null element:

    df_first_rows = df_grouped.first()
    Copy after login
  3. Reset the Index (Optional): If you need the 'id' and 'value' columns as separate columns, use the reset_index() method:

    df_first_rows = df_first_rows.reset_index()
    Copy after login

Example:

Consider the following DataFrame:

df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6, 6, 7, 7],
                   'value': ["first", "second", "second", "first",
                             "second", "first", "third", "fourth",
                             "fifth", "second", "fifth", "first",
                             "first", "second", "third", "fourth", "fifth"]})
Copy after login

Applying the上記の steps:

df_grouped = df.groupby(['id', 'value'])
df_first_rows = df_grouped.first()
df_first_rows = df_first_rows.reset_index()

print(df_first_rows)
Copy after login

Output:

   id   value
0   1   first
1   2   first
2   3   first
3   4   second
4   5   first
5   6   first
6   7   fourth
Copy after login

This code successfully retrieves the first row of each group defined by the 'id' and 'value' columns.

The above is the detailed content of How to retrieve the first row of each group in a Pandas DataFrame based on multiple columns?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template