Problem:
When attempting to combine two pandas data frames using the DataFrame.join() method, an error is encountered: "Columns overlap."
Data Frames:
Attempted Code:
<code class="python">restaurant_review_frame.join(other=restaurant_ids_dataframe, on='business_id', how='left')</code>
Error:
<code class="text">Exception: columns overlap: Index([business_id, stars, type], dtype=object)</code>
Solution:
To resolve the error and combine the data frames, use the merge() method instead of join():
<code class="python">import pandas as pd result = pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer')</code>
By default, merge() uses an outer join, which combines all rows from both data frames. The on argument specifies the column used to perform the merging operation.
Suffixes for Overlapping Columns:
Since both data frames have a column named stars, the merged data frame will contain two columns: stars_x and stars_y. To customize these suffixes, use the suffixes argument:
<code class="python">result = pd.merge(..., suffixes=('_restaurant_id', '_restaurant_review'))</code>
This will rename the stars columns to stars_restaurant_id and stars_restaurant_review in the merged data frame.
The above is the detailed content of How to Combine Pandas DataFrames with Overlapping Columns?. For more information, please follow other related articles on the PHP Chinese website!