Unicode is one of the standards for character encoding in computer science. It defines a way to represent characters numerically. In Unicode, every character has a corresponding number. This number is called a Unicode code point, and it is usually expressed in hexadecimal.
In JavaScript, when we need to use Unicode code points to represent some special characters, we usually express them in the form ofuXXXX
. Among them, XXXX is a 4-digit hexadecimal number, representing the Unicode code point of the corresponding character.
If we have a string that contains some special characters, we can convert these characters into the corresponding Unicode code points. JavaScript provides some built-in methods to accomplish this task.
String.charCodeAt() method can return the Unicode code point of the character at the specified position in the string. For example:
let str = "hello world"; console.log(str.charCodeAt(0)); // 104,h 的 Unicode 码点是 104
We can use a loop to traverse the entire string and convert the Unicode code point of each character into the formuXXXX
. For example:
let str = "hello world"; let unicodeStr = ""; for (let i = 0; i < str.length; i++) { // 将每一个字符的 Unicode 码点转换为 4 位的 16 进制数,然后补 0,最后拼接到结果字符串中 unicodeStr += "\u" + ("0000" + str.charCodeAt(i).toString(16)).slice(-4); } console.log(unicodeStr); // u0068u0065u006cu006cu006fu0020u0077u006fu0072u006cu0064
String.codePointAt() method can return the Unicode code point of the character at the specified position in the string, which is different from String.charCodeAt() Yes, it correctly handles Unicode code points larger than 16 bits. For example:
let str = "?"; console.log(str.charCodeAt(0)); // 55362,这个码点需要使用 2 个 16 进制数表示,而 charCodeAt() 返回的是第一个 16 进制数的码点 console.log(str.codePointAt(0)); // 134071,codePointAt() 返回整个码点
Use the String.codePointAt() method to convert all characters in the string to Unicode code points. You can traverse in a similar way to the above:
let str = "hello ??? world"; let unicodeStr = ""; for (let i = 0; i < str.length; i++) { let codePoint = str.codePointAt(i); // 如果该字符是大于 16 位的 Unicode 码点,则需要分组转换 if (codePoint > 0xffff) { i++; // 需要跳过下一个位置 // 将码点的高 16 位和低 16 位分别转换为 4 位的 16 进制,然后拼接到结果字符串中 unicodeStr += "\u" + ("0000" + (codePoint >> 16).toString(16)).slice(-4) + "\u" + ("0000" + (codePoint & 0xffff).toString(16)).slice(-4); } else { // 将码点转换为 4 位的 16 进制数,然后补 0,最后拼接到结果字符串中 unicodeStr += "\u" + ("0000" + codePoint.toString(16)).slice(-4); } } console.log(unicodeStr); // u0068u0065u006cu006cu006fu0020ud842udfb7ud842udfb7ud842udfb7u0020u0077u006fu0072u006cu0064
In the above code, We first determine whether the code point of the current character is greater than 16 bits. If so, we need to convert its upper 16 bits and lower 16 bits respectively, and then splice them into the result string. If it is not larger than 16 digits, it is directly converted to a 4-digit hexadecimal number and then spliced into the result string.
In summary, you can use the String.charCodeAt() and String.codePointAt() methods in JavaScript to convert the characters in the string to Unicode code points, and use the formuXXXX
express. If the string contains Unicode code points larger than 16 bits, you need to use the String.codePointAt() method to convert and convert the high 16 bits and low 16 bits into 4-digit hexadecimal numbers.
The above is the detailed content of javascript string to uxxxx. For more information, please follow other related articles on the PHP Chinese website!