When dealing with pandas DataFrames, it is sometimes necessary to "unnest" or "explode" columns that contain lists into multiple rows. However, this can be a computationally expensive operation, especially for large datasets.
For pandas versions 1.3 and above, there is a built-in function called DataFrame.explode that allows you to unnest multiple columns simultaneously. This function requires that all list columns have the same length. To use it:
df.explode(['B', 'C', 'D', 'E']).reset_index(drop=True)
For older versions of pandas, a slightly more complex approach is required:
df.set_index(['A']).apply(pd.Series.explode).reset_index()
Both methods provide efficient solutions, with set_index and explode being slightly faster than DataFrame.explode. The following table shows the performance comparison:
Method | Time (seconds) |
---|---|
DataFrame.explode | 0.00259 |
Set index and explode | 0.00127 |
Stacking approach | 0.120 |
While this question was initially marked as a duplicate, it specifically emphasizes the need for an efficient method that can handle large datasets. The answers to the duplicate question failed to adequately address this requirement.
The above is the detailed content of How to Efficiently Unnest Multiple List Columns in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!