When working with two or more dataframes in Pandas, there may be a need to obtain their cartesian product, which results in a new dataframe with all combinations of rows from the input dataframes.
In recent versions of Pandas, the merge function can be utilized to perform this operation with the how='cross' argument. This approach is both concise and efficient:
import pandas as pd df1 = pd.DataFrame({'col1':[1,2],'col2':[3,4]}) df2 = pd.DataFrame({'col3':[5,6]}) df_cartesian = df1.merge(df2, how='cross')
For earlier versions of Pandas, a slightly different technique is required. It involves creating a key column that is repeated for each row in both dataframes. Once this key column is added, merge can be used to perform the cartesian product:
import pandas as pd from pandas import merge df1 = pd.DataFrame({'key':[1,1], 'col1':[1,2],'col2':[3,4]}) df2 = pd.DataFrame({'key':[1,1], 'col3':[5,6]}) merge(df1, df2,on='key')[['col1', 'col2', 'col3']]
This approach is more involved but works effectively in older versions of Pandas.
The above is the detailed content of How to Efficiently Calculate the Cartesian Product of DataFrames in Pandas?. For more information, please follow other related articles on the PHP Chinese website!