Community Learn Tools Library Leisure

English

Home > Backend Development > Python Tutorial > Python method to collect Chinese garbled characters

Python method to collect Chinese garbled characters

高洛峰

Release： 2017-02-24 15:31:42

Original

1433 people have browsed it

In the past few days, when collecting a certain web page, most of the web pages were OK, but a small number of web pages had garbled characters. After debugging for a few days, I finally found that it was caused by some illegal characters.. This is recorded

1. Under normal circumstances, you can use

import chardet

thischarset = chardet.detect(strs)["encoding"]

Copy after login

to obtain the encoding method of the file or page

Or directly grab the charset = xxxx of the page to get

2. When encountering special characters in the content, the specified encoding will also cause garbled characters. That is, caused by illegal characters in the content, you can use encoding to ignore the illegal characters The way characters are processed.

strs = strs.decode("UTF-8","ignore").encode("UTF-8")

Copy after login

The second parameter of decode indicates the way to take when illegal characters are encountered

This parameter defaults to throwing an exception.

The above is the complete content of the perfect solution to the problem of collecting Chinese garbled characters in python brought by the editor. I hope it will be helpful to everyone. Please support PHP Chinese website

For more articles related to python’s method of collecting Chinese garbled characters, please pay attention to the PHP Chinese website!

Related labels：

python 中文乱码

source：php.cn

Previous article：Python black hat programming 3.4 across VLAN Next article：20 tips to make your Python fly

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

Examples of html settings for bold, italic, underline, strikethrough and other font effects

1970-01-01 08:00:00
Implement a Java version of Redis

1970-01-01 08:00:00
The simplest WeChat applet Demo

1970-01-01 08:00:00
Introduction to simple operation methods of pandas.DataFrame (create, index, add and delete) in python

1970-01-01 08:00:00
WeChat Mini Program: Example of how to implement tabs effect

1970-01-01 08:00:00
Python constructs custom methods to beautify dictionary structure output

1970-01-01 08:00:00
HTML5: Use Canvas to process Video in real time

1970-01-01 08:00:00
Asp.net uses SignalR to send pictures

1970-01-01 08:00:00
WeChat Mini Program Development Tutorial-App() and Page() Function Overview

1970-01-01 08:00:00
Detailed explanation of how to use python redis

1970-01-01 08:00:00

Latest Issues

Is there a way to force the text in the flexbox to be vertically centered, no matter what other CSS code we have? I have the following CSS code that is part of a larger CSS code used in a website I'm deve...

From 2024-04-06 20:41:51

0

1

518

Why does omitting 0ms sleep break my css transitions? I'm trying to implement a FLIP animation to see if I understand it correctly. In this code...

From 2024-04-06 16:29:50

0

2

490

Display AWS PDF files in Bootstrap mode in Laravel I have downloaded aws url like https://xxx-xx-dev.s3.ap-south-1.amazonaws.com/std_check/65...

From 2024-04-04 22:16:18

0

1

1450

Tried everything but still the HTML content is not showing Basically, the content of the html document won't display anything on the browser. This HT...

From 2024-04-04 19:16:15

0

1

3496

Combining HTTPS redirection with rewrite rules cannot add .php extension Hope someone can help. I want to remove the file extension (.php) from internal links, but...

From 2024-04-04 15:44:30

0

1

299

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template