Home > Backend Development > Python Tutorial > How to convert HTML entities to Unicode strings in Python?

How to convert HTML entities to Unicode strings in Python?

Mary-Kate Olsen
Release: 2024-11-05 05:21:02
Original
484 people have browsed it

How to convert HTML entities to Unicode strings in Python?

Convert XML/HTML Entities into Unicode String in Python

Question: How can I convert a string containing HTML entities into a Unicode string in Python? For example, the string "ǎ" should be converted to "ǎ" with a tone mark (u'u01ce').

Answer:

The Python standard library's HTMLParser has an undocumented function called unescape(). This function can convert HTML entities into their Unicode equivalents.

<code class="python">import HTMLParser
h = HTMLParser.HTMLParser()
h.unescape('&amp;copy; 2010') # u'\xa9 2010'
h.unescape('&amp;#169; 2010') # u'\xa9 2010'</code>
Copy after login

For Python 3.4 and above, the following code will work using the html module:

<code class="python">import html
html.unescape('&amp;copy; 2010') # u'\xa9 2010'
html.unescape('&amp;#169; 2010') # u'\xa9 2010'</code>
Copy after login

The above is the detailed content of How to convert HTML entities to Unicode strings in Python?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template