Home  >  Article  >  Backend Development  >  [Machine Learning] Data preprocessing: convert categorical data into numerical values

[Machine Learning] Data preprocessing: convert categorical data into numerical values

PHP中文网
PHP中文网Original
2017-07-05 18:13:062333browse

When performing python data analysis, data preprocessing must be performed first.

Sometimes we have to deal with some non-numeric data. Well, what I want to talk about today is how to deal with this data.

There are about three methods that we know so far:

1, use LabelEncoder for fast conversion;

2, map categories to numerical values ​​through mapping. However, this method has limited scope of application;

3, convert through the get_dummies method.

 1 import pandas as pd
 2 from io import StringIO
 3 
 4 csv_data = '''A,B,C,D
 5 1,2,3,4
 6 5,6,,8
 7 0,11,12,'''
 8 
 9 df = pd.read_csv(StringIO(csv_data))
10 print(df)
11 #统计为空的数目
12 print(df.isnull().sum())
13 print(df.values)
14 
15 #丢弃空的
16 print(df.dropna())
17 print('after', df)
18 from sklearn.preprocessing import Imputer
19 # axis=0 列   axis = 1 行
20 imr = Imputer(missing_values='NaN', strategy='mean', axis=0)
21 imr.fit(df) # fit  构建得到数据
22 imputed_data = imr.transform(df.values) #transform 将数据进行填充
23 print(imputed_data)
24 
25 df = pd.DataFrame([['green', 'M', 10.1, 'class1'],
26                    ['red', 'L', 13.5, 'class2'],
27                    ['blue', 'XL', 15.3, 'class1']])
28 df.columns =['color', 'size', 'price', 'classlabel']
29 print(df)
30 
31 size_mapping = {'XL':3, 'L':2, 'M':1}
32 df['size'] = df['size'].map(size_mapping)
33 print(df)
34 
35 ## 遍历Series
36 for idx, label in enumerate(df['classlabel']):
37     print(idx, label)
38 
39 #1, 利用LabelEncoder类快速编码,但此时对color并不适合,
40 #看起来,好像是有大小的
41 from sklearn.preprocessing import LabelEncoder
42 class_le = LabelEncoder()
43 color_le = LabelEncoder()
44 df['classlabel'] = class_le.fit_transform(df['classlabel'].values)
45 #df['color'] = color_le.fit_transform(df['color'].values)
46 print(df)
47 
48 #2, 映射字典将类标转换为整数
49 import numpy as np
50 class_mapping = {label: idx for idx, label in enumerate(np.unique(df['classlabel']))}
51 df['classlabel'] = df['classlabel'].map(class_mapping)
52 print('2,', df)
53 
54 
55 #3,处理1不适用的
56 #利用创建一个新的虚拟特征
57 from sklearn.preprocessing import OneHotEncoder
58 pf = pd.get_dummies(df[['color']])
59 df = pd.concat([df, pf], axis=1)
60 df.drop(['color'], axis=1, inplace=True)
61 print(df)

The above is the detailed content of [Machine Learning] Data preprocessing: convert categorical data into numerical values. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn