Home > Backend Development > Python Tutorial > How to Easily Identify and Display Differences Between DataFrames

How to Easily Identify and Display Differences Between DataFrames

DDD
Release: 2024-10-22 20:50:05
Original
388 people have browsed it

How to Easily Identify and Display Differences Between DataFrames

Compare DataFrames and Display Differences Side-by-Side

In the pursuit of identifying data discrepancies, the need often arises to compare two dataframes and highlight the changes between them. Consider the following example:

"StudentRoster Jan-1":
id    Name   score                    isEnrolled           Comment
111   Jack   2.17                     True                 He was late to class
112   Nick   1.11                     False                Graduated
113   Zoe    4.12                     True

"StudentRoster Jan-2":
id    Name   score                    isEnrolled           Comment
111   Jack   2.17                     True                 He was late to class
112   Nick   1.21                     False                Graduated
113   Zoe    4.12                     False                On vacation
Copy after login

To achieve the desired output, first determine the rows that have undergone any change:

ne = (df1 != df2).any(1)
Copy after login

Next, identify the specific entries that have changed:

ne_stacked = (df1 != df2).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col']
Copy after login

Proceed to extract the original and updated values for the changed entries:

difference_locations = np.where(df1 != df2)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]
Copy after login

Finally, present the differences in a user-friendly tabular format:

pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
Copy after login

This approach provides a comprehensive summary of the differences between two dataframes, highlighting both the changed values and their locations, enabling quick and efficient analysis of data discrepancies.

The above is the detailed content of How to Easily Identify and Display Differences Between DataFrames. For more information, please follow other related articles on the PHP Chinese website!

source:php
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template