How to use Python to identify fonts in pictures-Python Tutorial-php.cn

How to use Python to identify fonts in pictures

王林

Release： 2023-08-26 09:39:31

Original

4159 people have browsed it

How to use Python to identify fonts in pictures

How to use Python to perform font recognition on pictures

Font recognition is a technology that converts text in pictures into editable text. It has great practicality in many application scenarios, such as automated document processing, text extraction, OCR, etc. This article will introduce how to use Python to identify fonts on images and provide corresponding code examples.

Preparation
First, we need to install some necessary Python libraries. Enter the following command on the command line to install:
```
pip install pytesseract
pip install pillow
```
Copy after login
Among them, pytesseract is a Python library based on the Tesseract-OCR engine, which is used to identify text in pictures; Pillow is a commonly used image processing library in Python. Use for processing images.
Picture preprocessing
Before font recognition, we need to perform some preprocessing on the image to improve the accuracy of font recognition.

First, read the image and perform grayscale processing:

from PIL import Image

image = Image.open('image.jpg')
gray_image = image.convert('L')

Copy after login

Convert the image to grayscale because in the grayscale image, the contrast between the text and the background is more obvious , helps to improve the recognition accuracy.

Then, we can binarize the image, that is, process the text in the image into black and the background into white.

threshold = 150
binary_image = gray_image.point(lambda p: p > threshold and 255)

Copy after login

The threshold here is a threshold, which is adjusted according to the brightness of the picture.

Next, we can perform some noise reduction processing on the image to remove interfering noise.

from PIL import ImageFilter

denoised_image = binary_image.filter(ImageFilter.MinFilter)

Copy after login

MinFilter is a minimum value filter that can smooth the noise in the picture.

Finally, we can save the preprocessed image and display it:

denoised_image.save('processed_image.jpg')
denoised_image.show()

Copy after login

The above are the steps of image preprocessing. We can send the preprocessed image to the font recognition engine, Perform text extraction.

Font recognition
Font recognition is very simple using the pytesseract library. We only need to use the processed image as input and call the corresponding function.
```
import pytesseract

text = pytesseract.image_to_string(denoised_image, lang='eng')
print(text)
```
Copy after login
Among them, denoised_image is the image processed in the previous step, and the lang parameter indicates the recognized text language, which defaults to English.

Full code example
The following is a complete Python code example for font recognition on images:

from PIL import Image, ImageFilter
import pytesseract

# 图片预处理
image = Image.open('image.jpg')
gray_image = image.convert('L')
threshold = 150
binary_image = gray_image.point(lambda p: p > threshold and 255)
denoised_image = binary_image.filter(ImageFilter.MinFilter)
denoised_image.save('processed_image.jpg')
denoised_image.show()

# 字体识别
text = pytesseract.image_to_string(denoised_image, lang='eng')
print(text)

Copy after login

Summary
This article introduces how to use Python to identify fonts on images and provides corresponding code examples. By preprocessing and calling the pytesseract library, we can easily and quickly extract the text from the image and perform subsequent text processing. Font recognition has broad application prospects in practical applications. I hope the introduction in this article will be helpful to readers.

The above is the detailed content of How to use Python to identify fonts in pictures. For more information, please follow other related articles on the PHP Chinese website!