Ich richte einen neuen Server ein und möchte vollständige UTF-8-Unterstützung in meiner Webanwendung. Ich habe dies in der Vergangenheit bereits auf vorhandenen Servern versucht, schien aber immer auf ISO-8859-1 zurückgreifen zu müssen.
Wo genau muss ich die Kodierung/den Zeichensatz einstellen? Ich weiß, dass ich dazu Apache, MySQL und PHP konfigurieren muss. Gibt es eine Standard-Checkliste, der ich folgen kann, oder kann ich vielleicht eine Fehlerbehebung durchführen, wo die Nichtübereinstimmung auftritt?
Dies gilt für neue Linux-Server, auf denen MySQL 5, PHP, 5 und Apache 2 ausgeführt werden.
我想在chazomaticus 的出色答案中添加一件事一个>:
也不要忘记 META 标记(像这样,或者它的 HTML4 或 XHTML 版本):
这看起来微不足道,但 IE7 之前曾给我带来过问题。
我做的一切都是正确的;数据库、数据库连接和Content-Type HTTP标头都设置为UTF-8,在所有其他浏览器中都运行良好,但Internet Explorer仍然坚持使用“西欧”编码。
原来该页面缺少 META 标记。添加即可解决问题。
编辑:
W3C 实际上有一个相当大的专门讨论 I18N 的部分。他们有许多与此问题相关的文章 - 描述了 HTTP、(X)HTML 和 CSS 方面的内容:
他们建议同时使用 HTTP 标头和 HTML 元标记(或者在 XHTML 充当 XML 的情况下使用 XML 声明)。
数据存储:
Specify the
utf8mb4
character set on all tables and text columns in your database. This makes MySQL physically store and retrieve values encoded natively in UTF-8. Note that MySQL will implicitly useutf8mb4
encoding if autf8mb4_*
collation is specified (without any explicit character set).In older versions of MySQL (< 5.5.3), you'll unfortunately be forced to use simply
utf8
, which only supports a subset of Unicode characters. I wish I were kidding.数据访问:
In your application code (e.g. PHP), in whatever DB access method you use, you'll need to set the connection charset to
utf8mb4
. This way, MySQL does no conversion from its native UTF-8 when it hands data off to your application and vice versa.某些驱动程序提供自己的机制来配置连接字符集,该机制既更新其自身的内部状态,又通知 MySQL 连接上要使用的编码 - 这通常是首选方法。在 PHP 中:
If you're using thePDOabstraction layer with PHP ≥ 5.3.6, you can specify
charset
in theDSN:If you're usingmysqli, you can call
set_charset()
:If you're stuck with plainmysqlbut happen to be running PHP ≥ 5.2.3, you can call
mysql_set_charset
.If the driver does not provide its own mechanism for setting the connection character set, you may have to issue a query to tell MySQL how your application expects data on the connection to be encoded:
SET NAMES 'utf8mb4'
.The same consideration regarding
utf8mb4
/utf8
applies as above.输出:
Content-Type: text/html; charset=utf-8
. You can achieve that either by settingdefault_charset
in php.ini (preferred), or manually usingheader()
function.json_encode()
, addJSON_UNESCAPED_UNICODE
as a second parameter.输入:
mb_check_encoding()
does the trick, but you have to use it religiously. There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.其他代码注意事项:
显然,您将提供的所有文件(PHP、HTML、JavaScript 等)都应使用有效的 UTF-8 进行编码。
You need to make sure that every time you process a UTF-8 string, you do so safely. This is, unfortunately, the hard part. You'll probably want to make extensive use of PHP's
mbstring
extension.PHP's built-in string operations arenotby default UTF-8 safe.There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent
mbstring
function.要知道您在做什么(阅读:不要搞砸),您确实需要了解 UTF-8 以及它如何在尽可能最低的级别上工作。查看utf8.com中的任何链接,获取一些很好的资源,以了解您需要了解的所有内容。 p>