Home > Backend Development > Python Tutorial > How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?

How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?

Susan Sarandon
Release: 2024-12-21 05:26:14
Original
617 people have browsed it

How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?

Explode (Split) Pandas DataFrame String Entries into Separate Rows

In Pandas, a common requirement is to split comma-separated values in a text string column and create a new row for each entry. This can be achieved through various methods.

Using Series.explode() or DataFrame.explode()

For Pandas versions 0.25.0 and above, the Series.explode() and DataFrame.explode() methods provide a convenient way to explode CSV-like columns:

For single columns:

df.explode('column_name')
Copy after login

For multiple columns:

df.explode(['column1', 'column2'])  # Pandas 1.3.0+
Copy after login

Generic Vectorized Function

A more versatile vectorized approach that works for both normal and list columns is provided below:

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Convert CSV string columns to list columns
    for col in lst_cols:
        df[col] = df[col].str.split(',')

    # Extract all non-list columns
    idx_cols = df.columns.difference(lst_cols)

    # Calculate list lengths
    lens = df[lst_cols[0]].str.len()

    # Create exploded DataFrame
    result = (pd.DataFrame({
        col: np.repeat(df[col].values, lens)
        for col in idx_cols
    }, index=np.repeat(df.index.values, lens))
        .assign(**{col: np.concatenate(df.loc[lens>0, col].values)
                    for col in lst_cols}))

    # Handle empty list rows
    if (lens == 0).any():
        result = result.append(df.loc[lens==0, idx_cols], sort=False).fillna(fill_value)

    # Revert index order and reset index if requested
    result = result.sort_index()
    if not preserve_index:
        result = result.reset_index(drop=True)

    return result
Copy after login

Applications

CSV Column:

df['var1'] = df['var1'].str.split(',')
Copy after login

Multiple List Columns:

explode(df, ['num', 'text'], fill_value='')
Copy after login

The above is the detailed content of How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template