Home >Common Problem >What is the range of gbk encoding?
GBK encoding is an extension of GB2312 encoding, so it is fully compatible with the GB2312-80 standard. GBK encoding still uses a double-byte encoding scheme, and its encoding range is: 8140-FEFE (high byte from 81 to FE, low byte from 40 to FE), excluding the xx7F code point, a total of 23940 code points.
GBK encoding contains a total of 21,886 Chinese characters and graphic symbols, including 21,003 Chinese characters (including radicals and components) and 883 graphic symbols. GBK encoding supports all Chinese, Japanese and Korean Chinese characters in the international standard ISO/IEC10646-1 and the national standard GB13000-1, and includes all Chinese characters in the BIG5 encoding. The GBK encoding scheme was officially released on December 15, 1995. This version of the GBK specification is version 1.0.
1. Code point allocation and sequence
GBK also uses double-byte representation, and the overall coding range is 8140-FEFE. The first byte is between 81-FE, the last byte is between 40-FE, and a line of xx7F is excluded. There are a total of 23,940 code points, and a total of 21,886 Chinese characters and graphic symbols are included, including 21,003 Chinese characters (including radicals and components) and 883 graphic symbols.
All codes are divided into three parts:
1. Chinese character area.
Includes:
a. GB 2312 Chinese character area. That is GBK/2: B0A1-F7FE. Contains 6763 GB 2312 Chinese characters, arranged in original order.
b. GB 13000.1 Expand the Chinese character area. Includes:
(1), GBK/3: 8140-A0FE. Contains 6080 CJK Chinese characters in GB 13000.1.
(2) , GBK/4: AA40-FEA0. Contains 8160 CJK Chinese characters and supplemented Chinese characters. CJK Chinese characters are at the front, arranged according to UCS code size; supplementary Chinese characters (including radicals and components) are at the end, arranged according to the page number/character position of the "Kangxi Dictionary".
(3) The Chinese character "〇" is arranged in the graphic symbol area GBK/5: A996.
2. Graphic symbol area.
Includes:
a. GB 2312 non-Chinese character symbol area. That is GBK/1: A1A1-A9FE. In addition to the symbols of GB 2312, there are 10 lowercase Roman numerals and symbols supplemented by GB 12345. There are 717 symbols in total.
b. GB 13000.1 Expand the non-Chinese character area. That is GBK/5: A840-A9A0. BIG-5 Non-Chinese characters symbols, structural symbols and "〇" are arranged in this area. There are 166 symbols in total.
3. User-defined area:
is divided into three areas (1) (2) (3).
(1), AAA1-AFFE, 564 code points.
(2), F8A1-FEFE, 658 code points.
(3), A140-A7A0, 672 code points.
Although area (3) is open to users, its use is restricted because the possibility of adding new characters to this area in the future cannot be ruled out.
2. Glyphs
GBK has the following provisions on glyphs:
1. In principle, it is consistent with GB 13000.1 G The glyphs/strokes under the columns (i.e. Chinese characters derived from the legal standards of mainland China) remain consistent.
2. Within the general framework of CJK Chinese character recognition rules, implement "non-duplication coding orthography" ("GB-ization") for all GBK coded Chinese characters; that is, without causing duplication of coding, try to use Chinese New glyphs.
3. For Chinese characters that exceed the CJK Chinese character recognition rules, or the recognition rules have not been clearly stipulated, the old glyphs will be temporarily placed in the GBK code points. In this way, in many cases GBK includes both old and new glyphs of the same Chinese character.
4. The glyphs of non-Chinese symbols that are already included in GB 2312 shall be consistent with GB 2312; the parts beyond GB 2312 shall be consistent with GB 13000.1.
5. Pinyin letters with tones are in half-width form.
The above is the detailed content of What is the range of gbk encoding?. For more information, please follow other related articles on the PHP Chinese website!