1.PHP
PHP is actually the same as C language, uses ASCII, One char occupies 1 byte, in GBK encoding, one English occupies 1 byte, and one Chinese occupies 2 bytes. However, under UTF-8 encoding, an English character still occupies 1 byte, but a Chinese character occupies 3-4 bytes (usually 3 bytes). This usually allows you to obtain the word length of the string or String interception causes trouble. For example:
<?php $str = "我爱你Iloveyou"; echo strlen($str); //utf8下是17,GBK下是14,但如果问你$str的字长是多少,或者让你显示前6个字,其余省略号表示,怎么办? ?>
2.Java
A char in java is 2 bytes. Java uses Unicode, and 2 bytes are used to represent a character. The Unicode encoding of a Chinese or English character occupies 2 bytes, but if other encoding methods are used, the number of bytes occupied by a character is different. For example:
public class Test {
public static void main(String[] args){
String str = "我们aaaaa";
int byte_len = str.getBytes().length;
int len = str.length();
System.out.println("字节长度为:" + byte_len);
System.out.println("字符长度为:" + len);
}
}
The above example, the output results in GBK are: 9 and 7, but the output results in UTF-8 are: 11 and 7, that is, no matter what is used Encoding, the word lengths obtained using str.length() are all consistent. This method returns the number of characters in the string. Whether it is a Chinese character or an English character, it is regarded as one character.
The above introduces the relationship between Chinese and English byte lengths and encodings in PHP and Java, including aspects of the content. I hope it will be helpful to friends who are interested in PHP tutorials.