Unicode Characters in Identifiers: Restrictions in G
Despite enabling the -fextended-identifiers option, G 4.7 prohibits the use of certain Unicode characters, including the smiling face symbol (☺), as identifiers. This limitation stems from the lack of support for UTF-8 characters in identifiers.
Even when converting the identifier to its Universal Character Name (U0001F603), the compiler still fails. This is because G only supports a restricted set of characters for identifiers, as defined in ucnid.tab, based on the C99 and C 98 standards.
Furthermore, the -fextended-identifiers option is still experimental and may not function as intended. To address this restriction, G introduced support for the C11 character set in version 4.9.0. This allows the use of characters within the BMP range, including U0001F603.
However, despite using -finput-charset=UTF-8, the issue persists with some Unicode characters, such as ☺. A bug report has been filed to track this issue.
In contrast, Clang 3.3 successfully handles both the original identifier (☺) and the Universal Character Name (U0001F603) without any special options.
The above is the detailed content of Why Does G Still Restrict Some Unicode Characters in Identifiers Even With -fextended-identifiers?. For more information, please follow other related articles on the PHP Chinese website!