How to Compare DataFrames for Differences in Rows?-Python Tutorial-php.cn

How to Compare DataFrames for Differences in Rows?

Mary-Kate Olsen

Release： 2024-10-19 21:13:29

Original

303 people have browsed it

How to Compare DataFrames for Differences in Rows?

Comparing DataFrames for Differences in Rows

When comparing two dataframes with identical rows and columns, the simple comparison operation (df1 != df2) is sufficient. However, if the dataframes have different row sets, a different approach is needed to identify the differences.

Concat, Group, and Filter

One method to compare dataframes for row differences is to concatenate them, group by columns, and filter the unique rows. The following code illustrates this:

<code class="python">df = pd.concat([df1, df2])
df = df.reset_index(drop=True)
df_gpby = df.groupby(list(df.columns))
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
result = df.reindex(idx)</code>

Copy after login

The concatenated dataframe (df) is grouped by all its columns (df_gpby). The 'groups.values()' method returns an iterable of tuples, where each tuple represents the indices of unique rows. Filtering the tuples by length (len(x) == 1) identifies the rows that exist in only one dataframe. Finally, reindexing the dataframe with the filtered indices (idx) produces a dataframe containing the row differences.

Example Output

Using the example dataframes provided:

>>> result
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

Copy after login

This output shows the rows that are in df2 but not in df1.

The above is the detailed content of How to Compare DataFrames for Differences in Rows?. For more information, please follow other related articles on the PHP Chinese website!