Complex selection criteria with multiple conditions often arise in data manipulation tasks. Pandas provides methods and idioms to efficiently handle these scenarios.
Example:
Consider the following DataFrame:
import pandas as pd from random import randint df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)], 'B': [randint(1, 9)*10 for x in range(10)], 'C': [randint(1, 9)*100 for x in range(10)]})
Problem:
Select values from 'A' where the corresponding values for 'B' are greater than 50 and 'C' does not equal 900.
Solution:
We can apply column operations to obtain boolean Series objects for each condition:
df["B"] > 50 (df["B"] > 50) & (df["C"] != 900)
or equivalently:
(df["B"] > 50) & ~(df["C"] == 900)
These boolean Series can then be used to index into the DataFrame:
df["A"][(df["B"] > 50) & (df["C"] != 900)]
or, using .loc:
df.loc[(df["B"] > 50) & (df["C"] != 900), "A"]
Note:
For write access, it is recommended to use .loc instead of chaining indices, as the latter can result in a view instead of a copy, leading to potential issues.
The above is the detailed content of How to Efficiently Select Data from a Pandas DataFrame with Multiple Complex Criteria?. For more information, please follow other related articles on the PHP Chinese website!