Home > Backend Development > PHP Tutorial > Why is my PHP DOMDocument::loadHTML() Not Handling UTF-8 Encoding Correctly?

Why is my PHP DOMDocument::loadHTML() Not Handling UTF-8 Encoding Correctly?

Barbara Streisand
Release: 2024-12-28 00:43:10
Original
229 people have browsed it

Why is my PHP DOMDocument::loadHTML() Not Handling UTF-8 Encoding Correctly?

PHP DOMDocument loadHTML Not Encoding UTF-8 Correctly

When attempting to parse HTML using DOMDocument::loadHTML(), you may encounter issues with proper UTF-8 encoding. By default, DOMDocument treats input strings as ISO-8859-1, which can lead to errors when dealing with UTF-8 data.

Solution:

To ensure correct encoding, you can employ various methods:

  • Prepend Encoding Declarations: Add an XML encoding declaration or an HTML meta charset declaration to indicate the presence of UTF-8 characters:

    $contentType = '<meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=utf-8&quot;>';
    $dom->loadHTML($contentType . $profile);
    Copy after login
  • Use SmartDOMDocument: If the input HTML may already contain declarations, use the SmartDOMDocument library to resolve potential conflicts:

    $dom->loadHTML(mb_convert_encoding($profile, 'HTML-ENTITIES', 'UTF-8'));
    Copy after login
  • Alternative: In PHP 8.2 , use mb_encode_numericentity() for a safer encoding option:

    $dom->loadHTML(mb_encode_numericentity($profile, [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));
    Copy after login

HTML5 Considerations:

DOMDocument uses an HTML4 parser. For HTML5 documents, consider using alternative HTML parsers designed for HTML5 compliance.

Example:

The following code demonstrates the use of mb_convert_encoding() to correct incorrect UTF-8 encoding:

$profile = "

イリノイ州シカゴにて、アイルランド系の家庭に、9人兄弟の5番目として

"; $dom = new DOMDocument(); $dom->loadHTML(mb_convert_encoding($profile, 'HTML-ENTITIES', 'UTF-8')); echo $dom->saveHTML();
Copy after login

The above is the detailed content of Why is my PHP DOMDocument::loadHTML() Not Handling UTF-8 Encoding Correctly?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template