Home > Backend Development > Python Tutorial > How Can I Read and Write Unicode (UTF-8) Files Correctly in Python?

How Can I Read and Write Unicode (UTF-8) Files Correctly in Python?

Susan Sarandon
Release: 2024-11-05 02:35:02
Original
177 people have browsed it

How Can I Read and Write Unicode (UTF-8) Files Correctly in Python?

Unicode (UTF-8) File I/O in Python

In Python, handling Unicode text in files involves encoding and decoding operations. However, understanding these concepts can be challenging, as exemplified by a common issue:

Decoding Confusion:

Consider the following code in Python 2.4:

<code class="python">ss = u'Capit\xe1n'
ss8 = ss.encode('utf8')
print(ss, ss8)</code>
Copy after login

This code outputs:

Capit\xe1n b'Capit\xc3\xa1n'
Copy after login

The a-acute character (á) is represented differently in Unicode (u'Capitxe1n') and UTF-8 (ss8 = 'Capitxc3xa1n'). When printing ss8, Python defaults to an ASCII representation, hence the xc3xa1n sequence.

Opening the file 'f1' in write mode and writing ss8 to it results in 'Capitxc3xa1nn' being written to the file. Conversely, when writing ss to another file 'f2', Python attempts to interpret the a-acute character as an escape sequence, resulting in 'Capitxc3xa1nn'.

Decoding Solution:

To resolve this confusion, specify the encoding explicitly when opening the file. In Python 2.6 and later, the io.open function can be used:

<code class="python">import io
f = io.open("test", mode="r", encoding="utf-8")</code>
Copy after login

This approach ensures that the file is read and written in UTF-8, eliminating the need for manual encoding and decoding. In Python 3.x, the io.open function is an alias for the built-in open function, which also supports the encoding argument.

Alternatively, the codecs module can be used:

<code class="python">import codecs
f = codecs.open("test", "r", "utf-8")</code>
Copy after login

It's important to note that mixing read() and readline() methods may cause issues when usingcodecs.open.

The above is the detailed content of How Can I Read and Write Unicode (UTF-8) Files Correctly in Python?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template