In Pandas, subsetting a DataFrame based on a specific value is straightforward, as demonstrated by the following example:
import pandas as pd # Dataframe initialization df = pd.DataFrame({'A': [5, 6, 3, 4], 'B': [1, 2, 3, 5]}) # Subset based on a single value x = df[df['A'] == 3]
However, the challenge arises when selecting rows that match a list of values. Consider the following use case:
# List of values to filter on list_of_values = [3, 6] # Subset attempt (incorrect syntax) y = df[df['A'] in list_of_values]
This syntax will result in an error, as Pandas requires a slightly different syntax to subset based on multiple values.
The correct way to subset a DataFrame based on a list of values is to use the isin() method. Here's the corrected code:
y = df[df['A'].isin(list_of_values)]
Output:
A B 1 6 2 2 3 3
The isin() method takes a list or array of values as input and returns a DataFrame containing rows where the specified column matches any value in the input.
To select rows where the column values do not match the supplied list, you can use the ~ operator in conjunction with isin(). For example:
# Inverse subset z = df[~df['A'].isin(list_of_values)]
Output:
A B 0 5 1 3 4 5
The above is the detailed content of How to Subset a Pandas DataFrame Based on Multiple Values?. For more information, please follow other related articles on the PHP Chinese website!