This article shares with you the Python code to implement image text recognition. The content is quite good. I hope it can help friends in need.
Let’s take poetry recognition as an example
The following is the picture we want to identify
The recognition result after we run the code is A few characters were not recognized correctly, but most characters were recognized.
风急天高猿啸哀 渚芸胄芳少白鸟飞凤 无边落木萧萧下, 不尽长量工盲衮宕衮来 万里悲秋常1乍窨, 百年多病独登氤 艰难苦恨擎霜量 漂倒新停澍酉帆
Here we need to use two libraries: pytesseract and PIL
At the same time, we also need to install the recognition engine tesseract-ocr
You can install these two packages with the help of pip
- 1. Command line installation
pip install PIL
pip install pytesseract
- 2. If you use the pycharm editor, you can directly use pycharm to achieve quick installation.
Follow the following steps on the Settings page of pycharm
In this way, you can successfully install pytesseract. To install PIL, you only need to search for PIL in the third step above and click Install
At this time, we have installed the library and run the following code
from PIL import Image import pytesseract text=pytesseract.image_to_string(Image.open('denggao.jpeg'),lang='chi_sim') print(text)
will report the following error. The reason for the error is: the recognition engine tesseract-ocr is not installed
1. Download the installation package below, and then click to install it directly
tesseract-ocr installation package Unzip and install the Chinese language package
and then do the following operations after installing tesseract-ocr to support Chinese recognition. Because tesseract-ocr does not support Chinese recognition by default.
#2. After installing tesseract-ocr, we still need to do some configuration
In C:\Users\huxiu\AppData\Local\Programs\ Python\Python35\Lib\site-packages\pytesseract Find pytesseract.py and open it and do the following operations
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY #tesseract_cmd = 'tesseract' tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
You can also quickly open pytesseract.py through pycharm
Now all our configurations are complete. Run the following code to parse the picture poem Du Fu's Ascension into text.
The above is the detailed content of Python code implements image text recognition. For more information, please follow other related articles on the PHP Chinese website!