Reading CSV files containing accented characters can be challenging in Python due to its limited ASCII support. This article explores a solution to this problem, addressing the "UnicodeDecodeError" encountered when attempting to read such files.
To handle accented characters, we need a CSV reader that supports Unicode encoding. The following code modifies the standard CSV reader:
<code class="python">def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs): # Decode UTF-8 data into Unicode strings csv_reader = csv.reader(unicode_csv_data, dialect=dialect, **kwargs) for row in csv_reader: yield [unicode(cell, 'utf-8') for cell in row]</code>
The original solution incorrectly applied encoding to a byte-string instead of a Unicode string. The below code corrects this mistake:
<code class="python">import csv def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs): csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs) for row in csv_reader: yield [unicode(cell, 'utf-8') for cell in row]</code>
Now we can confidently read UTF8-encoded CSV files as follows:
<code class="python">filename = 'output.csv' reader = unicode_csv_reader(open(filename)) # Iterate through the rows, fields for field1, field2, field3 in reader: print field1, field2, field3 </code>
Remember that the provided solution assumes the input data is already in UTF8 encoding. If this is not the case, you can use the decode method to convert it to UTF8 before passing it to the CSV reader.
The above is the detailed content of How to Read UTF8 CSV Files with Accented Characters in Python?. For more information, please follow other related articles on the PHP Chinese website!