Explode (Split) Pandas DataFrame String Entries into Separate Rows
In Pandas, a common requirement is to split comma-separated values in a text string column and create a new row for each entry. This can be achieved through various methods.
Using Series.explode() or DataFrame.explode()
For Pandas versions 0.25.0 and above, the Series.explode() and DataFrame.explode() methods provide a convenient way to explode CSV-like columns:
For single columns:
df.explode('column_name')
For multiple columns:
df.explode(['column1', 'column2']) # Pandas 1.3.0+
Generic Vectorized Function
A more versatile vectorized approach that works for both normal and list columns is provided below:
def explode(df, lst_cols, fill_value='', preserve_index=False): # Convert CSV string columns to list columns for col in lst_cols: df[col] = df[col].str.split(',') # Extract all non-list columns idx_cols = df.columns.difference(lst_cols) # Calculate list lengths lens = df[lst_cols[0]].str.len() # Create exploded DataFrame result = (pd.DataFrame({ col: np.repeat(df[col].values, lens) for col in idx_cols }, index=np.repeat(df.index.values, lens)) .assign(**{col: np.concatenate(df.loc[lens>0, col].values) for col in lst_cols})) # Handle empty list rows if (lens == 0).any(): result = result.append(df.loc[lens==0, idx_cols], sort=False).fillna(fill_value) # Revert index order and reset index if requested result = result.sort_index() if not preserve_index: result = result.reset_index(drop=True) return result
Applications
CSV Column:
df['var1'] = df['var1'].str.split(',')
Multiple List Columns:
explode(df, ['num', 'text'], fill_value='')
The above is the detailed content of How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?. For more information, please follow other related articles on the PHP Chinese website!