You're using DOMDocument to parse HTML, but the encoding appears to be lost when you load the HTML. Japanese characters in the HTML are replaced with garbled text, while they display correctly when outputting the HTML string directly through echo.
DOMDocument assumes the input string to be in ISO-8859-1 (the HTTP/1.1 default character set) by default. When parsing UTF-8 strings, this incorrect assumption results in misinterpretation, leading to garbled characters.
To ensure DOMDocument loads the HTML string with the correct encoding, you have several options:
Here's an example using a meta charset declaration:
$profile = '<p>イリノイ州シカゴにて、アイルランド系の家庭に、9</p>'; $dom = new DOMDocument(); // Add meta charset declaration $contentType = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">'; $dom->loadHTML($contentType . $profile); echo $dom->saveHTML();
This will load the HTML string with the correct UTF-8 encoding, preserving the original Japanese characters.
The above is the detailed content of Why is my PHP DOMDocument loadHTML function not handling UTF-8 encoding correctly?. For more information, please follow other related articles on the PHP Chinese website!