Home  >  Article  >  Database  >  What are the differences between utf8 and utf8mb4 in mysql?

What are the differences between utf8 and utf8mb4 in mysql?

不言
不言Original
2018-08-22 10:00:531527browse

This article brings you what are the differences between utf8 and utf8mb4 in mysql? , has certain reference value, friends in need can refer to it, I hope it will be helpful to you.

1. Introduction

MySQL added the utf8mb4 encoding after 5.5.3. mb4 means most bytes 4, which is specially designed to be compatible with four-byte unicode. Fortunately, utf8mb4 is a superset of utf8, and no other conversion is required except changing the encoding to utf8mb4. Of course, in order to save space, it is usually enough to use utf8.

2. Content description

As mentioned above, since utf8 can store most Chinese characters, why should we use utf8mb4? It turns out that the maximum character length of utf8 encoding supported by mysql is 3 characters. section, an exception will be inserted if a 4-byte wide character is encountered. The maximum Unicode character that can be encoded by three-byte UTF-8 is 0xffff, which is the Basic Multilingual Plane (BMP) in Unicode. In other words, any Unicode characters that are not in the basic multi-text plane cannot be stored using Mysql's utf8 character set. Including Emoji expressions (Emoji is a special Unicode encoding, common on ios and android phones), many uncommon Chinese characters, as well as any new Unicode characters, etc.

3. Source of the problem

The original UTF-8 format uses one to six bytes and can encode up to 31 characters. The latest UTF-8 specification uses only one to four bytes and can encode up to 21 bits, which is just enough to represent all 17 Unicode planes.

utf8 is a character set in Mysql that only supports UTF-8 characters up to three bytes, which is the basic multi-text plane in Unicode.

Why does utf8 in Mysql only support UTF-8 characters with a maximum length of three bytes?
I thought about it, maybe it was because when Mysql first started to be developed, Unicode did not have an auxiliary plane. At that time, the Unicode Committee was still dreaming that "65535 characters are enough for the whole world." The string length in Mysql is calculated as the number of characters rather than the number of bytes. For the CHAR data type, sufficient length needs to be reserved for the string. When using the utf8 character set, the length that needs to be reserved is the longest character length of utf8 multiplied by the string length, so of course the maximum length of utf8 is limited to 3. For example, CHAR(100) Mysql will reserve 300 bytes. As for why subsequent versions do not provide support for 4-byte length UTF-8 characters, I think one is for backward compatibility considerations, and the other is that characters outside the basic multilingual plane are rarely used.

To save 4-byte length UTF-8 characters in Mysql, you need to use the utf8mb4 character set, but it is only supported after version 5.5.3 (view version: select version();). I think that in order to obtain better compatibility, you should always use utf8mb4 instead of utf8. For CHAR type data, utf8mb4 will consume more space. According to Mysql official recommendations, use VARCHAR instead of CHAR.

Related recommendations:

How to modify the length limit of the group_conca function in mysql

Usage of count() in large tables in mysql and count in mysql ()Optimization

The above is the detailed content of What are the differences between utf8 and utf8mb4 in mysql?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn