4000 mots de description détaillée, recommandant 20 méthodes de fonction Pandas utiles-Tutoriel Python-php.cn

Aujourd'hui, je vais partager quelques fonctions pandas inconnues. Vous n'en voyez peut-être pas beaucoup, mais elles sont très pratiques à utiliser et peuvent également aider nos analystes de données à faire de grands progrès. Améliorez l'efficacité du travail et j'espère que tout le monde pourra gagner quelque chose après l'avoir lu

items()方法items()方法
iterrows()方法
insert()方法
assign()方法
eval()方法
pop()方法
truncate()方法
count()方法
add_prefix()方法/add_suffix()方法
clip()方法
filter()

iterrows()

insert()

assign()

pop()

truncate()

count()

add_prefix()

add_suffix()

clip()

filter()

first()方法first()方法
isin()方法
df.plot.area()方法
df.plot.bar()方法
df.plot.box()方法
df.plot.pie()方法

`items()`方法

pandas当中的items()方法可以用来遍历数据集当中的每一列，同时返回列名以及每一列当中的内容，通过以元组的形式，示例如下

df = pd.DataFrame({&#39;species&#39;: [&#39;bear&#39;, &#39;bear&#39;, &#39;marsupial&#39;],
                  &#39;population&#39;: [1864, 22000, 80000]},
                  index=[&#39;panda&#39;, &#39;polar&#39;, &#39;koala&#39;])
df

Copier après la connexion

output

         species  population
panda       bear        1864
polar       bear       22000
koala  marsupial       80000

Copier après la connexion

然后我们使用items()方法

for label, content in df.items():
    print(f&#39;label: {label}&#39;)
    print(f&#39;content: {content}&#39;, sep=&#39;\n&#39;)
    print("=" * 50)

Copier après la connexion

output

label: species
content: panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
==================================================
label: population
content: panda     1864
polar    22000
koala    80000
Name: population, dtype: int64
==================================================

Copier après la connexion

相继的打印出了‘species’和‘population’这两列的列名和相应的内容

`iterrows()`方法

而对于iterrows()

isin()方法

🎜🎜df.plot.area() 方法🎜🎜🎜df .plot.bar()方法🎜🎜🎜df.plot.box()方法🎜🎜🎜df.plot.pie()方法🎜

`items()`方法

pandas当中的items()方法可以用来遍历数据集当中的每一列，们使用items()方法🎜

for label, content in df.iterrows():
    print(f&#39;label: {label}&#39;)
    print(f&#39;content: {content}&#39;, sep=&#39;\n&#39;)
    print("=" * 50)

Copier après la connexion

🎜output🎜

label: panda
content: species       bear
population    1864
Name: panda, dtype: object
==================================================
label: polar
content: species        bear
population    22000
Name: polar, dtype: object
==================================================
label: koala
content: species       marsupial
population        80000
Name: koala, dtype: object
==================================================

Copier après la connexion

🎜相继的打印出了'species' et'population'这两列的列名和相应的内容🎜

`iterrows()`方法

而对于iterrows()方法而言，其功能则是遍历数据集当中的每一行，返回每一行的索引以及带有列名的每一行的内容，示例如下🎜

df.insert(1, "size", [2000, 3000, 4000])

Copier après la connexion

🎜output🎜

label: panda
content: species       bear
population    1864
Name: panda, dtype: object
==================================================
label: polar
content: species        bear
population    22000
Name: polar, dtype: object
==================================================
label: koala
content: species       marsupial
population        80000
Name: koala, dtype: object
==================================================

Copier après la connexion

`insert()`方法

insert()方法主要是用于在数据集当中的特定位置处插入数据，示例如下

df.insert(1, "size", [2000, 3000, 4000])

Copier après la connexion

output

         species  size  population
panda       bear  2000        1864
polar       bear  3000       22000
koala  marsupial  4000       80000

Copier après la connexion

可见在DataFrame数据集当中，列的索引也是从0开始的

`assign()`方法

assign()方法可以用来在数据集当中添加新的列，示例如下

df.assign(size_1=lambda x: x.population * 9 / 5 + 32)

Copier après la connexion

output

         species  population    size_1
panda       bear        1864    3387.2
polar       bear       22000   39632.0
koala  marsupial       80000  144032.0

Copier après la connexion

从上面的例子中可以看出，我们通过一个lambda匿名函数，在数据集当中添加一个新的列，命名为‘size_1’，当然我们也可以通过assign()方法来创建不止一个列

df.assign(size_1 = lambda x: x.population * 9 / 5 + 32,
          size_2 = lambda x: x.population * 8 / 5 + 10)

Copier après la connexion

output

         species  population    size_1    size_2
panda       bear        1864    3387.2    2992.4
polar       bear       22000   39632.0   35210.0
koala  marsupial       80000  144032.0  128010.0

Copier après la connexion

`eval()`方法

eval()方法主要是用来执行用字符串来表示的运算过程的，例如

df.eval("size_3 = size_1 + size_2")

Copier après la connexion

output

         species  population    size_1    size_2    size_3
panda       bear        1864    3387.2    2992.4    6379.6
polar       bear       22000   39632.0   35210.0   74842.0
koala  marsupial       80000  144032.0  128010.0  272042.0

Copier après la connexion

当然我们也可以同时对执行多个运算过程

df = df.eval(&#39;&#39;&#39;
size_3 = size_1 + size_2
size_4 = size_1 - size_2
&#39;&#39;&#39;)

Copier après la connexion

output

         species  population    size_1    size_2    size_3   size_4
panda       bear        1864    3387.2    2992.4    6379.6    394.8
polar       bear       22000   39632.0   35210.0   74842.0   4422.0
koala  marsupial       80000  144032.0  128010.0  272042.0  16022.0

Copier après la connexion

`pop()`方法

pop()方法主要是用来删除掉数据集中特定的某一列数据

df.pop("size_3")

Copier après la connexion

output

panda      6379.6
polar     74842.0
koala    272042.0
Name: size_3, dtype: float64

Copier après la connexion

而原先的数据集当中就没有这个‘size_3’这一例的数据了

`truncate()`方法

truncate()方法主要是根据行索引来筛选指定行的数据的，示例如下

df = pd.DataFrame({&#39;A&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;],
                   &#39;B&#39;: [&#39;f&#39;, &#39;g&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;],
                   &#39;C&#39;: [&#39;k&#39;, &#39;l&#39;, &#39;m&#39;, &#39;n&#39;, &#39;o&#39;]},
                  index=[1, 2, 3, 4, 5])

Copier après la connexion

output

   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o

Copier après la connexion

我们使用truncate()方法来做一下尝试

df.truncate(before=2, after=4)

Copier après la connexion

output

   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

Copier après la connexion

我们看到参数before和after存在于truncate()方法中，目的就是把行索引2之前和行索引4之后的数据排除在外，筛选出剩余的数据

`count()`方法

count()方法主要是用来计算某一列当中非空值的个数，示例如下

df = pd.DataFrame({"Name": ["John", "Myla", "Lewis", "John", "John"],
                   "Age": [24., np.nan, 25, 33, 26],
                   "Single": [True, True, np.nan, True, False]})

Copier après la connexion

output

    Name   Age Single
0   John  24.0   True
1   Myla   NaN   True
2  Lewis  25.0    NaN
3   John  33.0   True
4   John  26.0  False

Copier après la connexion

我们使用count()方法来计算一下数据集当中非空值的个数

df.count()

Copier après la connexion

output

Name      5
Age       4
Single    4
dtype: int64

Copier après la connexion

add_prefix()方法/add_suffix()方法

add_prefix()方法和add_suffix()方法分别会给列名以及行索引添加后缀和前缀，对于Series()数据集而言，前缀与后缀是添加在行索引处，而对于DataFrame()数据集而言，前缀与后缀是添加在列索引处，示例如下

s = pd.Series([1, 2, 3, 4])

Copier après la connexion

output

0    1
1    2
2    3
3    4
dtype: int64

Copier après la connexion

我们使用add_prefix()方法与add_suffix()方法在Series()数据集上

s.add_prefix(&#39;row_&#39;)

Copier après la connexion

output

row_0    1
row_1    2
row_2    3
row_3    4
dtype: int64

Copier après la connexion

又例如

s.add_suffix(&#39;_row&#39;)

Copier après la connexion

output

0_row    1
1_row    2
2_row    3
3_row    4
dtype: int64

Copier après la connexion

而对于DataFrame()形式数据集而言，add_prefix()方法以及add_suffix()方法是将前缀与后缀添加在列索引处的

df = pd.DataFrame({&#39;A&#39;: [1, 2, 3, 4], &#39;B&#39;: [3, 4, 5, 6]})

Copier après la connexion

output

Copier après la connexion

示例如下

df.add_prefix("column_")

Copier après la connexion

output

   column_A  column_B
0         1         3
1         2         4
2         3         5
3         4         6

Copier après la connexion

又例如

df.add_suffix("_column")

Copier après la connexion

output

   A_column  B_column
0         1         3
1         2         4
2         3         5
3         4         6

Copier après la connexion

`clip()`方法

clip()方法主要是通过设置阈值来改变数据集当中的数值，当数值超过阈值的时候，就做出相应的调整

data = {&#39;col_0&#39;: [9, -3, 0, -1, 5], &#39;col_1&#39;: [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)

Copier après la connexion

output

df.clip(lower = -4, upper = 4)

Copier après la connexion

output

   col_0  col_1
0      4     -2
1     -3     -4
2      0      4
3     -1      4
4      4     -4

Copier après la connexion

我们看到参数lower和upper分别代表阈值的上限与下限，数据集当中超过上限与下限的值会被替代。

`filter()`方法

pandas当中的filter()方法是用来筛选出特定范围的数据的，示例如下

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;],
                  columns=[&#39;one&#39;, &#39;two&#39;, &#39;three&#39;])

Copier après la connexion

output

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9
D   10   11     12

Copier après la connexion

我们使用filter()方法来筛选数据

df.filter(items=[&#39;one&#39;, &#39;three&#39;])

Copier après la connexion

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

Copier après la connexion

我们还可以使用正则表达式来筛选数据

df.filter(regex=&#39;e$&#39;, axis=1)

Copier après la connexion

output

   one  three
A    1      3
B    4      6
C    7      9
D   10     12

Copier après la connexion

当然通过参数axis来调整筛选行方向或者是列方向的数据

df.filter(like=&#39;B&#39;, axis=0)

Copier après la connexion

output

   one  two  three
B    4    5      6

Copier après la connexion

`first()`方法

当数据集当中的行索引是日期的时候，可以通过该方法来筛选前面几行的数据

index_1 = pd.date_range(&#39;2021-11-11&#39;, periods=5, freq=&#39;2D&#39;)
ts = pd.DataFrame({&#39;A&#39;: [1, 2, 3, 4, 5]}, index=index_1)
ts

Copier après la connexion

output

            A
2021-11-11  1
2021-11-13  2
2021-11-15  3
2021-11-17  4
2021-11-19  5

Copier après la connexion

我们使用first()方法来进行一些操作，例如筛选出前面3天的数据

ts.first(&#39;3D&#39;)

Copier après la connexion

output

            A
2021-11-11  1
2021-11-13  2

Copier après la connexion

`isin()`方法

isin()方法主要是用来确认数据集当中的数值是否被包含在给定的列表当中

df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12])),
                  index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;],
                  columns=[&#39;one&#39;, &#39;two&#39;, &#39;three&#39;])
df.isin([3, 5, 12])

Copier après la connexion

output

     one    two  three
A  False  False   True
B  False   True  False
C  False  False  False
D  False  False   True

Copier après la connexion

若是数值被包含在列表当中了，也就是3、5、12当中，返回的是True，否则就返回False

`df.plot.area()`方法

下面我们来讲一下如何在Pandas当中通过一行代码来绘制图表，将所有的列都通过面积图的方式来绘制

df = pd.DataFrame({
    &#39;sales&#39;: [30, 20, 38, 95, 106, 65],
    &#39;signups&#39;: [7, 9, 6, 12, 18, 13],
    &#39;visits&#39;: [20, 42, 28, 62, 81, 50],
}, index=pd.date_range(start=&#39;2021/01/01&#39;, end=&#39;2021/07/01&#39;, freq=&#39;M&#39;))

ax = df.plot.area(figsize = (10, 5))

Copier après la connexion

output

4000 mots de description détaillée, recommandant 20 méthodes de fonction Pandas utiles

`df.plot.bar()`方法

下面我们看一下如何通过一行代码来绘制柱状图

df = pd.DataFrame({&#39;label&#39;:[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], &#39;values&#39;:[10, 30, 50, 70]})
ax = df.plot.bar(x=&#39;label&#39;, y=&#39;values&#39;, rot=20)

Copier après la connexion

output

4000 mots de description détaillée, recommandant 20 méthodes de fonction Pandas utiles

当然我们也可以根据不同的类别来绘制柱状图

age = [0.1, 17.5, 40, 48, 52, 69, 88]
weight = [2, 8, 70, 1.5, 25, 12, 28]
index = [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;, &#39;E&#39;, &#39;F&#39;, &#39;G&#39;]
df = pd.DataFrame({&#39;age&#39;: age, &#39;weight&#39;: weight}, index=index)
ax = df.plot.bar(rot=0)

Copier après la connexion

output

4000 mots de description détaillée, recommandant 20 méthodes de fonction Pandas utiles

当然我们也可以横向来绘制图表

ax = df.plot.barh(rot=0)

Copier après la connexion

output

4000 mots de description détaillée, recommandant 20 méthodes de fonction Pandas utiles

`df.plot.box()`方法

我们来看一下箱型图的具体的绘制，通过pandas一行代码来实现

data = np.random.randn(25, 3)
df = pd.DataFrame(data, columns=list(&#39;ABC&#39;))
ax = df.plot.box()

Copier après la connexion

output

4000 mots de description détaillée, recommandant 20 méthodes de fonction Pandas utiles

`df.plot.pie()`方法

接下来是饼图的绘制

df = pd.DataFrame({&#39;mass&#39;: [1.33, 4.87 , 5.97],
                   &#39;radius&#39;: [2439.7, 6051.8, 6378.1]},
                  index=[&#39;Mercury&#39;, &#39;Venus&#39;, &#39;Earth&#39;])
plot = df.plot.pie(y=&#39;mass&#39;, figsize=(8, 8))

Copier après la connexion

output

4000 mots de description détaillée, recommandant 20 méthodes de fonction Pandas utiles