Home > Backend Development > Python Tutorial > How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?

How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?

Mary-Kate Olsen
Release: 2024-12-24 18:06:15
Original
268 people have browsed it

How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?

Split (Explode) Pandas Dataframe String Entry to Separate Rows

Problem:
Manipulating a Pandas dataframe containing a column of comma-separated values, the goal is to split each CSV field into individual rows, preserving the original data structure.

Solution:

The recommended solution is to leverage the Pandas Series.explode() or DataFrame.explode() method, introduced in Pandas 0.25.0 and enhanced in Pandas 1.3.0 to support multi-column explode.

To explode a single column, use Series.explode():

df.explode('column_name')
Copy after login

For multiple columns, use DataFrame.explode():

df.explode(['column1', 'column2'])
Copy after login

Example:

df = pd.DataFrame({
    'A': [[0, 1, 2], 'foo', [], [3, 4]],
    'B': 1,
    'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]
})

df.explode('A')
Copy after login

Output:

     A  B          C
0    0  1  [a, b, c]
0    1  1  [a, b, c]
0    2  1  [a, b, c]
1  foo  1        NaN
2  NaN  1         []
3    3  1     [d, e]
3    4  1     [d, e]
Copy after login

For a more generic approach that works for multiple normal and list columns, consider the following function:

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Ensure `lst_cols` is list-alike
    if lst_cols and not isinstance(lst_cols, (list, tuple, np.ndarray, pd.Series)):
        lst_cols = [lst_cols]

    # Calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    # Preserve original index values
    idx = np.repeat(df.index.values, lens)

    # Create an "exploded" DataFrame
    res = (pd.DataFrame({
                col:np.repeat(df[col].values, lens)
                for col in df.columns.difference(lst_cols)
            },
            index=idx)
             .assign(**{col:np.concatenate(df.loc[lens>0, col].values)
                            for col in lst_cols}))

    # Append rows with empty lists
    if (lens == 0).any():
        res = (res.append(df.loc[lens==0, df.columns.difference(lst_cols)], sort=False)
                  .fillna(fill_value))

    # Revert to original index order and reset if requested
    res = res.sort_index()
    if not preserve_index:        
        res = res.reset_index(drop=True)

    return res
Copy after login

Example of exploding a CSV-like column:

df = pd.DataFrame({
    'var1': 'a,b,c d,e,f,x,y'.split(),
    'var2': [1, 2]
})

explode(df.assign(var1=df.var1.str.split(',')), 'var1')
Copy after login

Output:

  var1  var2
0    a     1
1    b     1
2    c     1
3    d     2
4    e     2
5    f     2
6    x     2
7    y     2
Copy after login

The above is the detailed content of How to Split Comma-Separated Values in a Pandas DataFrame into Separate Rows?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template