How Do C 11 String Literals Handle Different Unicode Encodings?-C++-php.cn

How Do C 11 String Literals Handle Different Unicode Encodings?

Barbara Streisand

Release： 2024-12-15 00:06:11

Original

824 people have browsed it

How Do C 11 String Literals Handle Different Unicode Encodings?

Unicode Encoding for String Literals in C 11

The introduction of new character and string literal types in C 11 has extended the language's capabilities in handling Unicode encodings. While there are now four character types (char, wchar_t, char16_t, char32_t) and five string literal types, the behavior and compatibility of these characters and strings with encoding mechanisms have specific rules.

Encoding Compatibility

The x character reference can be used with all string types, allowing the inclusion of character values represented in hexadecimal. However, u and U references are restricted to strings with UTF-encoded semantics. Character references are converted based on the encoding of the containing string.

String Length and Encoding

Although the number of Unicode code units contained in a string may vary depending on the encoding, the arrays representing string literals are fixed-width, with each element representing a single code unit. The number of code units used is determined by the Unicode encoding of the string.

UTF-Encoding Semantics

u"" string literals are specifically UTF-16 encoded, while u8"" string literals are encoded in UTF-8. UTF-16 encodings use char16_t code units, while UTF-8 encodings use variable-length byte sequences to represent code points.

Lone Surrogates

Lone surrogates (0xD800-0xDFFF) are not permitted as code points in u sequences. UTF-16 surrogate pairs must be used to represent Unicode characters in this range.

Encoding Awareness

Standard string manipulation functions do not inherently handle Unicode encoding semantics and treat UTF-encoded strings as a sequence of individual code units. However, input and output streams through locales allow for reading and writing Unicode-encoded values with proper contextualization.

The above is the detailed content of How Do C 11 String Literals Handle Different Unicode Encodings?. For more information, please follow other related articles on the PHP Chinese website!