UnicodeDecodeError: Invalid Continuation Byte
When attempting to decode a string using the "utf-8" codec, the error "UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9..." may arise. This indicates an invalid continuation byte in the string.
In the provided code snippet:
o = "a test of \xe9 char" v = o.decode("utf-8")
The string "a test of xe9 char" contains a character represented by the byte xe9. This byte is not a valid continuation byte in a UTF-8 sequence, so the "utf-8" codec cannot decode it.
However, when using the "latin-1" codec instead, the decoding succeeds:
v = o.decode("latin-1")
This is because the "latin-1" codec interprets xe9 as a single-byte character, rather than as part of a UTF-8 sequence. Consequently, the string remains a string without encountering the UnicodeDecodeError.
The above is the detailed content of Why Does `utf-8` Decoding Fail on `\\xe9` While `latin-1` Succeeds?. For more information, please follow other related articles on the PHP Chinese website!