java - UTF-16编码问题

Question

{代码...} Unicde编码中不明白的，像UTF-8/UTF-32中str4编码后字节数都是str1+str2+str3，但UTF-16则不然，UTF-16到底是如何编码的呢？求赐教。

黄舟 · Answer

UTF-16 is a variable-length encoding format with a minimum of two bytes. Because it is two bytes, Big Endian and Small Endian are involved. In your example above, because endianess is not specified, a two-byte BOM is added. Plus the two bytes of the original character (ASCII) encoding, so it's 4 bytes. If you use utf-16-le or utf-16-be, it will be two bytes. Please check yourself for specific Java representation.

怪我咯 · Answer

After UTF-16 decoding

feff0061 //a
feff0062 //b
feff0063 //c
feff006100620063 //abc

There is feff at the beginning, which is used to indicate that the string is big-endian (the high-order byte is placed in the front). The reason for this mark is that there are two modes: big-endian and little-endian (the high-order byte is placed at the back) in the system. 0x01 0x02 is read as 0x0102 in big endian, and the same value is read as 0x0201 in little endian, which is different, so it needs to be marked in feff.

Php8, I'm coming too

Learn website layout in 30 minutes

Shangguan Oracle Beginner to Proficient Video Tutorial

Your first line of UNI-APP code

Flutter from scratch to app launch

Brother Lian New Linux Video Tutorial

AXURE 9 Video Tutorial (Suitable for Product Manager Interactive Product Design UI)

Zero Basic Proficiency PS Video Tutorial

16 day UI video tutorial to get you started

PS Techniques and Slicing Techniques Video Tutorial

Alibaba Cloud Environment Construction and Project Launch Video Tutorial

Overview of Computer Networks - Basic Knowledge that Programmers Must Master

Essential Tutorial for Programmers - HTTP Protocol Explanation

Websocket Video Tutorial