Community Learn Tools Library Leisure

English

Home > Backend Development > Python Tutorial > Python BeautifulSoup中文乱码问题的2种解决方法

Python BeautifulSoup中文乱码问题的2种解决方法

WBOY

Release： 2016-06-16 08:44:24

Original

1414 people have browsed it

解决方法一：

使用python的BeautifulSoup来抓取网页然后输出网页标题,但是输出的总是乱码,找了好久找到解决办法,下面分享给大家
首先是代码

复制代码代码如下:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.jb51.net/'
page = urllib2.urlopen(url)

soup = BeautifulSoup(page,from_encoding="utf8")
print soup.original_encoding
print (soup.title).encode('gb18030')

file = open("title.txt","w")
file.write(str(soup.title))
file.close()

for link in soup.find_all('a'):
print link['href']
在刚开始测试的时候发现,虽然输出是乱码的,但是写在文件里面却是正常的.然后在网上找了找解决办法才发现
print一个对象的逻辑：内部是调用对象的__str__得到对应的字符串的，此处对应的是soup的__str__ 而针对于soup本身，其实已经是Unicode编码，所以可以通过指定__str__输出时的编码为GBK，以使得此处正确显示非乱码的中文
而对于cmd：（中文的系统中）编码为GBK,所以只要重新编码为gb18030就可以正常输出了
就是下面这行代码

复制代码代码如下:

print (soup.title).encode('gb18030')

解决方法二：

BeautifulSoup在解析utf-8编码的网页时，如果不指定fromEncoding或者将fromEncoding指定为utf-8会出现中文乱码的现象。

解决此问题的方法是将Beautifulsoup构造函数中的fromEncoding参数的值指定为：gb18030

复制代码代码如下:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen('http://www.jb51.net/');
soup = BeautifulSoup(page,fromEncoding="gb18030")
print soup.originalEncoding
print soup.prettify()

Related labels：

beautifulsoup python Garbled characters

source：php.cn

Previous article：python实现的二叉树算法和kmp算法实例 Next article：python中的__init__ 、__new__、__call__小结

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

What is a NullPointerException, and how do I fix it?

2024-10-22 09:46:29
From Novice to Coder: Your Journey Begins with C Fundamentals

2024-10-13 13:53:41
Unlocking Web Development with PHP: A Beginner's Guide

2024-10-12 12:15:51
Demystifying C: A Clear and Simple Path for New Programmers

2024-10-11 22:47:31
Unlock Your Coding Potential: C Programming for Absolute Beginners

2024-10-11 19:36:51
Unleash Your Inner Programmer: C for Absolute Beginners

2024-10-11 15:50:41
Automate Your Life with C: Scripts and Tools for Beginners

2024-10-11 15:07:41
PHP Made Easy: Your First Steps in Web Development

2024-10-11 14:21:21
Build Anything with Python: A Beginner's Guide to Unleashing Your Creativity

2024-10-11 12:59:11
The Key to Coding: Unlocking the Power of Python for Beginners

2024-10-11 12:17:31

Latest Issues

Python/MySQL cannot persist integer data correctly No code is required here. I want to save a very long number because I'm making a game and ...

From 2024-04-04 19:09:44

0

1

367

Using selenium want to click and define URL in class I need another tip today. I'm trying to build Python/Selenium code and the idea is to clic...

From 2024-04-04 14:14:44

0

1

3492

Selenium + Python - inspect image via execute_script I need to verify that an image is displayed on the page using selenium in python. For exam...

From 2024-04-03 09:32:15

0

1

375

How to keep the first X rows and delete table rows I have a big table with millions of records in MySQLincident_archive, I want to sort the r...

From 2024-04-01 18:32:54

0

1

347

How to scrape specific Google Weather text using BeautifulSoup? How to find the course text "New York City, USA" in Python using BeautifulSoup? ...

From 2024-04-01 14:06:14

0

1

308

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template