First of all, char is the basic type of Java. The number of bytes occupied by the basic type is fixed. For example, int occupies 4 bytes and double occupies 8 bytes. This can make Java The types occupied on different platforms are fixed, which ensures the portability of Java. Therefore, the char type in Java fixedly occupies 2 bytes. (Note: The char type can also store a Chinese character).
Secondly, String is stored in a more flexible way. In String, one English character occupies 1 byte, while Chinese characters occupy different numbers of bytes depending on the encoding. Under UTF-8 encoding, one Chinese character occupies 3 bytes; while using GBK encoding, one Chinese character occupies 2 bytes.
The test code is as follows:
import java.io.UnsupportedEncodingException; public class StrTest { public static void main(String[] args) throws UnsupportedEncodingException { String str1 = "hello"; String str2 = "你好abc"; System.out.println("utf-8编码下'hello'所占的字节数:" + str1.getBytes("utf-8").length); System.out.println("gbk编码下'hello'所占的字节数:" + str1.getBytes("gbk").length); System.out.println("utf-8编码下'你好abc'所占的字节数:" + str2.getBytes("utf-8").length); System.out.println("gbk编码下你好'你好abc'所占的字节数:" + str2.getBytes("gbk").length); } }
Output result:
utf-8编码下’hello’所占的字节数: 5 gbk编码下’hello’所占的字节数: 5 utf-8编码下’你好abc’所占的字节数: 9 gbk编码下你好’你好abc’所占的字节数: 7
It can be seen that for String, one English character occupies 1 byte. Chinese characters occupy 2 (GBK encoding) or 3 (UTF-8 encoding) bytes. You can also use this method to check the status of other encodings, which will not be described here.
Finally, based on the characteristics of String, you can determine whether a string contains Chinese characters. The example is as follows:
public class StrTest { public static void main(String[] args) throws UnsupportedEncodingException { searchChineseCharacter("Good morning"); searchChineseCharacter("hello 早上好"); } //找出一个字符串中的汉字 public static void searchChineseCharacter(String str){ //正则表达式,用于匹配中文字符 String regex = "[\u4e00-\u9fa5]"; //如果str的长度和其所占字节数不等,说明包含中文 if (str.length() != str.getBytes().length){ Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(str); System.out.print("'" + str + "' 中的汉字为:"); while (matcher.find()){ System.out.print(matcher.group()); } } else { System.out.println("'" + str + "' 中无汉字"); } } }
Output result:
‘Good morning’ 中无汉字 ‘hello 早上好’ 中的汉字为:早上好
Recommended tutorial: java Getting Started Tutorial
The above is the detailed content of How many bytes does a string occupy in java?. For more information, please follow other related articles on the PHP Chinese website!