Compare DataFrames and Display Differences Side-by-Side
In the pursuit of identifying data discrepancies, the need often arises to compare two dataframes and highlight the changes between them. Consider the following example:
"StudentRoster Jan-1": id Name score isEnrolled Comment 111 Jack 2.17 True He was late to class 112 Nick 1.11 False Graduated 113 Zoe 4.12 True "StudentRoster Jan-2": id Name score isEnrolled Comment 111 Jack 2.17 True He was late to class 112 Nick 1.21 False Graduated 113 Zoe 4.12 False On vacation
To achieve the desired output, first determine the rows that have undergone any change:
ne = (df1 != df2).any(1)
Next, identify the specific entries that have changed:
ne_stacked = (df1 != df2).stack() changed = ne_stacked[ne_stacked] changed.index.names = ['id', 'col']
Proceed to extract the original and updated values for the changed entries:
difference_locations = np.where(df1 != df2) changed_from = df1.values[difference_locations] changed_to = df2.values[difference_locations]
Finally, present the differences in a user-friendly tabular format:
pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
This approach provides a comprehensive summary of the differences between two dataframes, highlighting both the changed values and their locations, enabling quick and efficient analysis of data discrepancies.
The above is the detailed content of How to Easily Identify and Display Differences Between DataFrames. For more information, please follow other related articles on the PHP Chinese website!