使用gensim模块建立word2vec模型,代码:
from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence
f = open('wiki.zh.txt.work_cutted', 'r')
work2vec = Word2Vec(LineSentence(f), min_count=5)
return work2vec
看到了Python进程占用了300%的进程:
因为有GIL存在,一个Python进程最多跑到100%的CPU,而看到gensim库可以跑到300%。请问是怎么做到的?
机器的CPU信息:/proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2128.076
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm rep_good aperfmperf unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm dts
bogomips : 4256.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
gensim의 맨 아래 레이어는 Numpy와 pandas에 의존하며, 후자 두 레이어는 순수 Python이 아니고 C++, fortan 등을 혼합한 것입니다. 따라서 gensim 계산 속도를 높이려면 기본 구현은 새로 생성된 Posix Thread 하위 스레드여야 하며 전체 로드에서 실행되어야 합니다.