Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

爱喝马黛茶的安东尼

Jun 04, 2019 am 11:01 AM

pythonChinese

Foregoing

Two Pythonclass libraries needed for this article

jieba: Chinese word segmentation tool

wordcloud: Word cloud generation tool under Python

In the last lesson, we learnedhow to make an English word cloud. In this article, we will explain how to make a Chinese word cloud. After reading this article, you will learn How to generate a word cloud from any Chinese text

Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

Introduction to code composition

The code part comes from other people’s blogs, but due to bugs Or for reasons of operating efficiency, I have made major changes to the code

The first part of the code sets most of the parameters needed to run the code. You can easily use the code directly without making too many modifications.

The second part is some settings of jieba. Of course, you can also use the isCN parameter to cancel Chinese word segmentation

The third part is the settings of wordcloud, including image display and saving

##Use the code by comment ##
关于该程序的使用,你可以直接读注释在数分钟内学会如何使用它
# - * - coding: utf - 8 -*-
from os import path
from scipy.misc import imread
import matplotlib.pyplot as plt
import jieba
# jieba.load_userdict("txt\userdict.txt")
# 添加用户词库为主词典,原词典变为非主词典
from wordcloud import WordCloud, ImageColorGenerator
# 获取当前文件路径
# __file__ 为当前文件, 在ide中运行此行会报错,可改为
# d = path.dirname(&#39;.&#39;)
d = path.dirname(__file__)
stopwords = {}
isCN = 1 #默认启用中文分词
back_coloring_path = "img/lz1.jpg" # 设置背景图片路径
text_path = &#39;txt/lz.txt&#39; #设置要分析的文本路径
font_path = &#39;D:\Fonts\simkai.ttf&#39; # 为matplotlib设置中文字体路径没
stopwords_path = &#39;stopwords\stopwords1893.txt&#39; # 停用词词表
imgname1 = "WordCloudDefautColors.png" # 保存的图片名字1(只按照背景图片形状)
imgname2 = "WordCloudColorsByImg.png"# 保存的图片名字2(颜色按照背景图片颜色布局生成)
my_words_list = [&#39;路明非&#39;] # 在结巴的词库中添加新词
back_coloring = imread(path.join(d, back_coloring_path))# 设置背景图片
# 设置词云属性
wc = WordCloud(font_path=font_path,  # 设置字体
               background_color="white",  # 背景颜色
               max_words=2000,  # 词云显示的最大词数
               mask=back_coloring,  # 设置背景图片
               max_font_size=100,  # 字体最大值
               random_state=42,
               width=1000, height=860, margin=2,# 设置图片默认的大小,但是如果使用背景图片的话,那么保存的图片大小将会按照其大小保存,margin为词语边缘距离
               )
# 添加自己的词库分词
def add_word(list):
    for items in list:
        jieba.add_word(items)
add_word(my_words_list)
text = open(path.join(d, text_path)).read()
def jiebaclearText(text):
    mywordlist = []
    seg_list = jieba.cut(text, cut_all=False)
    liststr="/ ".join(seg_list)
    f_stop = open(stopwords_path)
    try:
        f_stop_text = f_stop.read( )
        f_stop_text=unicode(f_stop_text,&#39;utf-8&#39;)
    finally:
        f_stop.close( )
    f_stop_seg_list=f_stop_text.split(&#39;\n&#39;)
    for myword in liststr.split(&#39;/&#39;):
        if not(myword.strip() in f_stop_seg_list) and len(myword.strip())>1:
            mywordlist.append(myword)
    return &#39;&#39;.join(mywordlist)
if isCN:
    text = jiebaclearText(text)
# 生成词云, 可以用generate输入全部文本(wordcloud对中文分词支持不好,建议启用中文分词),也可以我们计算好词频后使用generate_from_frequencies函数
wc.generate(text)
# wc.generate_from_frequencies(txt_freq)
# txt_freq例子为[(&#39;词a&#39;, 100),(&#39;词b&#39;, 90),(&#39;词c&#39;, 80)]
# 从背景图片生成颜色值
image_colors = ImageColorGenerator(back_coloring)
plt.figure()
# 以下代码显示图片
plt.imshow(wc)
plt.axis("off")
plt.show()
# 绘制词云
# 保存图片
wc.to_file(path.join(d, imgname1))
image_colors = ImageColorGenerator(back_coloring)
plt.imshow(wc.recolor(color_func=image_colors))
plt.axis("off")
# 绘制背景图片为颜色的图片
plt.figure()
plt.imshow(back_coloring, cmap=plt.cm.gray)
plt.axis("off")
plt.show()
# 保存图片
wc.to_file(path.join(d, imgname2))

Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

##Summary

If you want to use this code to generate English words Cloud, then you need to set the isCN parameter to 0 and provide an English stop word list.

The above is the detailed content of Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:CSDN. If there is any infringement, please contact admin@php.cn delete

Building High-Performance Computing Solutions with PythonJul 21, 2025 am 03:17 AM

Pythoncanbeusedeffectivelyforhigh-performancecomputing(HPC)byleveragingspecifictoolsandtechniques.1)UsecompiledextensionslikeNumPy,SciPy,Cython,andNumbaforfasternumericalcomputations.2)TakeadvantageofparallelismwithmultiprocessingforCPU-boundtasksand

Factory Method Pattern in PythonJul 21, 2025 am 03:15 AM

The factory method pattern is a design pattern that instantiates specific classes through subclass decisions. It defines an interface to create objects, delaying the creation of objects to subclass processing, thereby achieving decoupling. This mode is suitable for scenarios such as hidden object creation details, uncertain future subclass types, and the need to call different objects in a unified interface. The implementation steps include: defining the base class or interface; creating multiple subclasses; writing factory functions or methods that return different instances according to parameters. Factory methods can be further encapsulated into classes to facilitate management of complex logic. When using it, you should pay attention to avoiding too many conditional judgments, preventing business logic from being mixed into the factory, avoiding over-design. It is also recommended to deal with abnormal inputs, keep the logic simple, and use it only when scalability is required.

Building a Chatbot with Python NLTKJul 21, 2025 am 03:12 AM

It is feasible to use Python and NLTK as chatbots, but the goals and methods need to be clarified. 1. Install Python and NLTK and download the necessary corpus such as punkt, stopwords and wordnet. 2. The implementation process includes text preprocessing (word segmentation, stop word deactivation, word shape restoration), intent recognition or keyword matching, and response generation. 3. Simple response can be achieved through keyword matching, or classification models can be trained to improve the effect. 4. Extension directions include introducing more powerful NLP tools such as spaCy or Transformers, maintaining Q&A databases, and avoiding too much hardcoded logic. In short, it is suitable for introductory and small projects, with low deployment costs but strong controllability.

Image Processing with Python PillowJul 21, 2025 am 03:11 AM

Pillow library image processing is very simple and suitable for daily operations. 1. Install pipinstallpillow and import the Image module to start; 2. You can open the picture and view width, height, format and other information; 3. Use crop to extract specific areas; 4. Use resize to zoom, pay attention to maintaining the proportion and avoiding deformation; 5. Use the draw.text method to add text watermarks, and specify the font path, position and color; 6. Use the paste method to overlay transparent layers in the image watermark; 7. Filter processing supports turning grayscale images, adjusting brightness contrast, etc.; 8. Although the Pillow function is basic, it is practical, and mastering common methods and document query can quickly complete the requirements.

Python for Distributed ComputingJul 21, 2025 am 03:03 AM

Python is widely used in distributed computing because of its rich ecosystem and efficient development. 1. Distributed computing is to split tasks into multiple machines to perform to improve efficiency. Python is chosen because it has many libraries, easy to debug, and strong compatibility. 2. Common frameworks include Celery (asynchronous tasks), Dask (data science), PySpark (big data processing), and Ray (high-performance scheduling). 3. Celery can be used to build a simple system: install dependencies, write tasks, start worker, and trigger tasks. 4. Note points include task granularity, data lightweighting, failure retry, monitoring logs and task dependency management.

Python Regular Expressions TutorialJul 21, 2025 am 03:02 AM

Regular expressions are used in Python to find, match, and replace text patterns. 1. Use re.search and re.match to determine whether the text contains a specific pattern. The former searches for the entire string, while the latter only starts from the beginning. 2. Extract content through brackets, such as using match.group(1) to obtain the required part when extracting the email address; 3. Use re.sub to replace sensitive words or format text, such as replacing the email with [EMAIL]; 4. Notes include escaping special characters, controlling greedy matching, ignoring uppercase and uppercase case and multi-line matching. Mastering these can quickly process text on mobile phones.

Building Cross-Platform Mobile Apps with Python BeeWareJul 21, 2025 am 03:01 AM

BeeWare is a tool for developing cross-platform mobile applications using Python, which enables a truly native experience through native controls. 1. It is based on the TogaUI toolkit and Briefcase packaging tool, and supports macOS, Windows, Linux, iOS and Android platforms; 2. Unlike Kivy, Flutter or ReactNative, it directly calls the platform API without bridging; 3. It is suitable for developers familiar with Python to carry out rapid prototype development and data-driven gadget-based app development; 4. The current version is more suitable for small and medium-sized or experimental projects, and there are still restrictions on scenarios with high requirements for complex UI and performance; 5. The steps to get started include installing Be

Implementing Edge Computing Solutions with PythonJul 21, 2025 am 02:56 AM

The core of Python's implementation of edge computing is to bring data processing and decision-making close to data sources, and improve efficiency by deploying lightweight services, executing local inference and establishing a cache upload mechanism. 1. Use Flask or FastAPI to deploy local API services on edge nodes to achieve fast response; 2. Use Python to perform data preprocessing and lightweight AI inference to reduce the amount of uploaded data; 3. Use SQLite to implement local cache and combine asynchronous upload to cope with network instability. At the same time, you need to pay attention to details such as dependency control, model size, retry strategy and resource occupation.

See all articles

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Agnes Tachyon Build Guide | A Pretty Derby Musume

2 weeks agoByJack chen

Oguri Cap Build Guide | A Pretty Derby Musume

3 weeks agoByJack chen

Peak: How To Revive Players

4 weeks agoByDDD

Grass Wonder Build Guide | Uma Musume Pretty Derby

1 weeks agoByJack chen

PEAK How to Emote

3 weeks agoByJack chen

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

SublimeText3 Linux new version

SublimeText3 Linux latest version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Zend Studio 13.0.1

Powerful PHP integrated development environment

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

8646

1787

1730

1582

1451