In UTF-8 encoding, one Chinese character is equal to three bytes, one Chinese punctuation mark occupies three bytes; one English character is equal to one byte, one English punctuation mark occupies one byte; one number symbol equals one byte.
The operating environment of this article: Windows 10 system, DELL G3 computer.
In UTF-8 encoding: one Chinese character is equal to three bytes, and Chinese punctuation occupies three bytes.
One English character is equal to one byte, and English punctuation occupies one byte.
Unicode encoding: One English code is equal to two bytes, and one Chinese character (including traditional Chinese) is equal to two bytes. Chinese punctuation occupies two bytes, and English punctuation takes up two bytes.
Extended information:
UTF-8 uses 1~4 bytes to encode each character:
1. One US-ASCIl character only Requires 1 byte encoding (Unicode range is U 0000~U 007F).
2. Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac and other letters with diacritical marks require 2-byte encoding (Unicode range is represented by U 0080~U 07FF).
3. Characters in other languages (including Chinese, Japanese and Korean characters, Southeast Asian characters, Middle Eastern characters, etc.) include most commonly used characters and use 3-byte encoding.
4. Other rarely used language characters use 4-byte encoding.
For more computer-related knowledge, please visit the FAQ column!
The above is the detailed content of How many bytes do Chinese characters occupy in UTF8 encoding?. For more information, please follow other related articles on the PHP Chinese website!