首頁 > 後端開發 > Python教學 > 熊貓備忘錄

熊貓備忘錄

Patricia Arquette
發布: 2024-10-04 06:13:02
原創
376 人瀏覽過

Pandas Cheat Sheet

Comprehensive Guide to Pandas: The Ultimate Cheat Sheet

Pandas is an open-source data manipulation and analysis library built on top of Python. It provides easy-to-use data structures like DataFrame and Series that facilitate data handling for all kinds of data analysis tasks. It is widely used for handling structured data, data cleaning, and preparation, which is a crucial step in data science workflows. Whether it's time series data, heterogeneous data, or data that comes in CSV, Excel, SQL databases, or JSON format, Pandas offers powerful tools to make working with this data much easier.


1. Importing Pandas

Before using any Pandas functionality, you need to import the library. It is commonly imported as pd to keep the syntax concise.


import pandas as pd


登入後複製

2. Pandas Data Structures

Series

A Series is a one-dimensional labeled array, capable of holding any data type (integer, string, float, etc.). It can be created from a list, NumPy array, or a dictionary.


# Create a Pandas Series from a list
s = pd.Series([1, 2, 3, 4])


登入後複製

Expected Output:


0    1
1    2
2    3
3    4
dtype: int64


登入後複製

DataFrame

A DataFrame is a two-dimensional labeled data structure, similar to a table in a database or an Excel spreadsheet. It consists of rows and columns. Each column can have a different data type.


# Create a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [24, 27, 22], 'City': ['New York', 'London', 'Berlin']}
df = pd.DataFrame(data)


登入後複製

Expected Output:


      Name  Age      City
0    Alice   24  New York
1      Bob   27    London
2  Charlie   22    Berlin


登入後複製

3. Creating DataFrames and Series

From a Dictionary


data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)


登入後複製

From a List of Lists


data = [[1, 2, 3], [4, 5, 6]]
df = pd.DataFrame(data, columns=["A", "B", "C"])


登入後複製

Expected Output:


   A  B  C
0  1  2  3
1  4  5  6


登入後複製

4. Inspecting DataFrames

Pandas provides several methods to inspect and get information about your data.

  • df.head(n) – Returns the first n rows.
  • df.tail(n) – Returns the last n rows.
  • df.info() – Provides summary information about the DataFrame.
  • df.describe() – Generates descriptive statistics of the DataFrame.

# Inspecting the DataFrame
print(df.head())
print(df.tail())
print(df.info())
print(df.describe())


登入後複製

Expected Output:


   A  B  C
0  1  2  3
1  4  5  6

   A  B  C
0  1  2  3
1  4  5  6

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       2 non-null      int64
 1   B       2 non-null      int64
 2   C       2 non-null      int64
dtypes: int64(3)
memory usage: 128.0 bytes

       A    B    C
count  2.0  2.0  2.0
mean   2.5  3.5  4.5
std    2.1  2.1  2.1
min    1.0  2.0  3.0
25%    1.5  2.5  3.5
50%    2.0  3.0  4.0
75%    2.5  3.5  4.5
max    4.0  5.0  6.0


登入後複製

5. Indexing, Slicing, and Subsetting Data

Accessing Columns

You can access columns either using dot notation or by indexing with square brackets.


# Dot notation
print(df.A)

# Bracket notation
print(df["B"])


登入後複製

Accessing Rows by Index

You can use .iloc[] for integer-location based indexing and .loc[] for label-based indexing.


# Using iloc (index-based)
print(df.iloc[0])  # Access first row

# Using loc (label-based)
print(df.loc[0])  # Access first row using label


登入後複製

Slicing Data

You can slice DataFrames to get subsets of data. You can slice rows or columns.


# Select specific rows and columns
subset = df.loc[0:1, ["A", "C"]]


登入後複製

Expected Output:


   A  C
0  1  3
1  4  6


登入後複製

6. Modifying DataFrames

Adding Columns

You can add columns directly to the DataFrame by assigning values.


df['D'] = [7, 8]  # Adding a new column


登入後複製

Modifying Column Values

You can modify the values of a column by accessing it and assigning new values.


df['A'] = df['A'] * 2  # Modify the 'A' column


登入後複製

Dropping Columns or Rows

You can drop rows or columns using the drop() function.


df = df.drop(columns=['D'])  # Dropping a column
df = df.drop(index=1)  # Dropping a row by index


登入後複製

7. Handling Missing Data

Handling missing data is a critical task. Pandas provides several functions to handle missing data.

  • df.isnull() – Detects missing values (returns a DataFrame of booleans).
  • df.notnull() – Detects non-missing values (returns a DataFrame of booleans).
  • df.fillna(value) – Fills missing values with a specified value.
  • df.dropna() – Removes rows with missing values.

df = df.fillna(0)  # Fill missing data with 0
df = df.dropna()  # Drop rows with any missing values


登入後複製

8. Data Aggregation and Grouping

GroupBy

The groupby() function is used for splitting the data into groups, applying a function, and then combining the results.


# Grouping by a column and calculating the sum
grouped = df.groupby('City').sum()


登入後複製

Aggregation Functions

You can apply various aggregation functions like sum(), mean(), min(), max(), etc.


# Aggregating data using mean
df.groupby('City').mean()


登入後複製

9. Sorting and Ranking

Sorting Data

You can sort a DataFrame by one or more columns using the sort_values() function.


# Sorting by a column in ascending order
df_sorted = df.sort_values(by='Age')

# Sorting by multiple columns
df_sorted = df.sort_values(by=['Age', 'Name'], ascending=[True, False])


登入後複製

Ranking

You can rank the values in a DataFrame using rank().


df['Rank'] = df['Age'].rank()


登入後複製

10. Merging, Joining, and Concatenating DataFrames

Merging DataFrames

You can merge two DataFrames based on a common column or index.


df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'C': ['C0', 'C1', 'C2']})
merged_df = pd.merge(df1, df2, on='A')


登入後複製

Concatenating DataFrames

You can concatenate DataFrames along rows or columns using concat().


df1 = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['A', 'B'])
concat_df = pd.concat([df1, df2], axis=0)


登入後複製

Conclusion

Pandas is a versatile tool for data manipulation, from importing and cleaning data to performing complex operations. This cheat sheet provides a quick overview of some of the most common Pandas features, helping you make your data analysis workflow more efficient.

以上是熊貓備忘錄的詳細內容。更多資訊請關注PHP中文網其他相關文章!

來源:dev.to
本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
作者最新文章
熱門教學
更多>
最新下載
更多>
網站特效
網站源碼
網站素材
前端模板