python - pandas或者sklearn中如何将字符形式的标签数字化
ringa_lee
ringa_lee 2017-04-18 10:06:45
0
4
1777

例如我有一个标签列形如:
[A,A,A,B,B,C,C,C,C]
转化为:
[0,0,0,1,1,2,2,2,2]

pandas和scikit-learn中有简单的实现吗?

另外大家在学习一个新的包时是怎样根据问题找到文档的具体位置的?有啥经验可以交流下吗?谢谢啦!

ringa_lee
ringa_lee

ringa_lee

reply all(4)
左手右手慢动作

pandas is very easy to implement, just convert it into Categories objects. The terms are called factors and levels, and levels are usually automatically converted to numerical storage.

c = ['A','A','A','B','B','C','C','C','C']
category = pd.Categorical(c)

Next, check the label of the category

print category.labels
洪涛

There are ready-made ones in sklearn:

preprocessing.LabelEncoder().fit_transform(data)

See official documentation for details

You can directly convert between characters and numbers

阿神

I have never used it in practice. I don’t know if the map function can meet your needs. Please refer to the documentation for details
http://pandas.pydata.org/pand...

大家讲道理

This is just 映射 logic. There is no need to use pandas and scikit-learn. It’s overkill and overkill

a = ['A','A','A','B','B','C','C','C','C']
result = [x for x in map(lambda c: ord(c) - ord('A'), a)]

If you have to use pandas, then isn’t this exactly Series

import pandas as pd
a = ['A','A','A','B','B','C','C','C','C']
result = pd.Series(a).map(lambda c: ord(c) - ord('A'))
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template