Detailed explanation of the introduction and usage of commonly used functions in the pandas library

WBOY
Release: 2024-01-24 10:19:17
Original
1296 people have browsed it

Detailed explanation of the introduction and usage of commonly used functions in the pandas library

Introduction to common functions of the pandas library and detailed usage explanations

Introduction:

pandas is an open source, flexible and efficient data analysis and operation tool. It is widely used in data science, machine learning, finance, statistics and other fields. This article will introduce the commonly used functions and their usage in the pandas library, hoping to help readers better understand and use pandas.

1. Introduction to data structures

  1. Series (sequence)

Series is one of the most basic data structures in pandas. The data type of the dimension, which can contain any data type (integer, floating point number, string, etc.). The creation method is as follows:

import pandas as pd data = [1, 2, 3, 4, 5] s = pd.Series(data) print(s)
Copy after login

Output result:

0 1 1 2 2 3 3 4 4 5 dtype: int64
Copy after login
  1. DataFrame (data frame)

DataFrame is the most commonly used data structure in pandas. It It is a two-dimensional tabular data structure that can be regarded as composed of several Series. The creation method is as follows:

import pandas as pd data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'city': ['New York', 'London', 'Tokyo']} df = pd.DataFrame(data) print(df)
Copy after login

Output result:

name age city 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Tokyo
Copy after login

2. Introduction to common functions and detailed usage

  1. head() and tail()

The head() function is used to view the first few rows of the DataFrame, and the first 5 rows are viewed by default; the tail() function is used to view the last few rows of the DataFrame, and the last 5 rows are viewed by default. The sample code is as follows:

import pandas as pd df = pd.read_csv('data.csv') print(df.head()) print(df.tail())
Copy after login
  1. shape attribute

The shape attribute returns the shape of the DataFrame, that is, the number of rows and columns. The sample code is as follows:

import pandas as pd df = pd.read_csv('data.csv') print(df.shape)
Copy after login
  1. info() function

info() function is used to view the overall information of the DataFrame, including column names, number of non-null values, and data types wait. The sample code is as follows:

import pandas as pd df = pd.read_csv('data.csv') print(df.info())
Copy after login
  1. describe() function

describe() function is used to count statistical information of numeric columns in DataFrame, such as count, mean, and standard deviation. , minimum value, maximum value, etc. The sample code is as follows:

import pandas as pd df = pd.read_csv('data.csv') print(df.describe())
Copy after login
  1. sort_values() function

sort_values() function is used to sort the DataFrame based on the value of the specified column. The sample code is as follows:

import pandas as pd df = pd.read_csv('data.csv') df_sorted = df.sort_values(by='age', ascending=False) # 按照age列的值进行降序排序 print(df_sorted)
Copy after login
  1. groupby() function

The groupby() function is used to group by specified columns and aggregate the grouped results. The sample code is as follows:

import pandas as pd df = pd.read_csv('data.csv') grouped = df.groupby('city') mean_age = grouped['age'].mean() # 计算每个城市的平均年龄 print(mean_age)
Copy after login
  1. merge() function

merge() function is used to merge two DataFrames according to specified columns. The sample code is as follows:

import pandas as pd df1 = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']}) df2 = pd.DataFrame({'A': [2, 3, 4], 'C': ['x', 'y', 'z']}) merged = pd.merge(df1, df2, on='A') # 按照列A合并 print(merged)
Copy after login
  1. apply() function

The apply() function is used to apply a custom function to each element in the DataFrame. The sample code is as follows:

import pandas as pd df = pd.read_csv('data.csv') # 定义一个自定义函数:将年龄加上10 def add_ten(age): return age + 10 df['age'] = df['age'].apply(add_ten) # 对age列的每个元素应用add_ten函数 print(df)
Copy after login

Conclusion:

This article briefly introduces the commonly used functions of the pandas library and their usage, including basic operations of Series and DataFrame, data statistics, sorting, grouping, merging and automatic Define function applications, etc. We hope that the introduction in this article can help readers better understand and use the pandas library and play a greater role in data analysis and processing.

The above is the detailed content of Detailed explanation of the introduction and usage of commonly used functions in the pandas library. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn