Home  >  Article  >  Java  >  Solving the problem of front and back garbled characters in JavaWEB

Solving the problem of front and back garbled characters in JavaWEB

黄舟
黄舟Original
2017-08-20 09:13:441458browse

The following editor will bring you a summary of solutions to the garbled problems in front and backends in javaWEB. The editor thinks it’s pretty good, so I’ll share it with you now and give it as a reference. Let’s follow the editor to take a look

Several common encoding formats and meanings in JAVA:

ASCII Code

Anyone who has studied computers knows ASCII codes. There are 128 in total, which are represented by the lower 7 bits of a byte. 0~31 are control characters such as line feed, carriage return, delete, etc. ; 32~126 are printing characters, which can be input through the keyboard and displayed.

ISO-8859-1

128 characters are obviously not enough, so the ISO organization has formulated some more based on ASCII codes. Column standards are used to extend ASCII encoding, they are ISO-8859-1~ISO-8859-15, among which ISO-8859-1 covers most Western European language characters and is the most widely used. ISO-8859-1 is still a single-byte encoding, which can represent a total of 256 characters.

GB2312

Its full name is "Basic Set of Chinese Encoded Character Sets for Information Exchange", which is a double-byte encoding , the total coding range is A1-F7, of which A1-A9 is the symbol area, containing a total of 682 symbols, and B0-F7 is the Chinese character area, containing 6763 Chinese characters.

GBK

The full name is "Chinese Character Internal Code Extension Specification", which is a new standard formulated by the State Bureau of Technical Supervision for Windows95 Chinese character internal code specification. It appears to expand GB2312 and add more Chinese characters. Its coding range is 8140~FEFE (removing XX7F). There are a total of 23940 code bits. It can represent 21003 Chinese characters. Its coding is and GB2312 compatible, that is to say, Chinese characters encoded with GB2312 can be decoded with GBK, and there will be no garbled characters.

GB18030

The full name is "Chinese Coded Character Set for Information Exchange", which is a mandatory standard in my country. It may be a single Byte, double-byte or four-byte encoding, its encoding is compatible with GB2312 encoding. Although this is a national standard, it is not widely used in actual application systems.

UTF-16

When it comes to UTF, we must mention Unicode (Universal Code), ISO is trying to create one A brand new hyper-linguistic dictionary, all languages ​​in the world can be translated into each other through this dictionary. You can imagine how complex this dictionary is. For detailed specifications of Unicode, please refer to the corresponding documents. Unicode is the basis of Java and XML. The following is a detailed introduction to the storage form of Unicode in the computer.

UTF-16 specifically defines how Unicode characters are accessed in the computer. UTF-16 uses two bytes to represent the Unicode conversion format. This is a fixed-length representation method. Any character can be represented by two bytes. Two bytes are 16 bits, so it is called UTF-16. UTF-16 is very convenient for representing characters. Every two bytes represent one character. This greatly simplifies the operation during string operations. This is also a very important reason why Java uses UTF-16 as the character storage format in memory.

UTF-8

UTF-16 uniformly uses two bytes to represent one character, although it is very simple and convenient in representation. , but it also has its shortcomings. A large number of characters that can be represented by one byte are now represented by two bytes, doubling the storage space. In today's network bandwidth is still very limited, this will increase the size of the network. transmission traffic, and it is not necessary. UTF-8 uses a variable length technology, and each encoding area has a different character length. Different types of characters can be composed of 1~6 bytes.

UTF-8 encoding rules:

1. If the highest bit (8th bit) of a byte is 0, Indicates that this is an ASCII character (00 – 7F). It can be seen that all ASCII encodings are already UTF-8.

2. If a byte starts with 11, the number of consecutive 1's indicates the number of bytes of this character. For example: 110xxxxx means that it is the first byte of a double-byte UTF-8 character.

3. If a byte starts with 10, it means it is not the first byte, and you need to look forward to get the first byte of the current character

Comparison of different encoding formats

It can handle the following four encoding formats of Chinese characters. The encoding rules of GB2312 and GBK are similar, but GBK has a larger range and can handle all Chinese characters. Therefore, when comparing GB2312 and GBK, GBK should be selected. UTF-16 and UTF-8 both deal with Unicode encoding, and their encoding rules are not the same. Relatively speaking, UTF-16 encoding is the most efficient, it is easier to convert characters to bytes, and it is better to perform string operations. It is suitable for use between local disk and memory, and can quickly switch between characters and bytes. For example, Java's memory encoding uses UTF-16 encoding. However, it is not suitable for transmission between networks, because network transmission can easily damage the byte stream. Once the byte stream is damaged, it will be difficult to recover. In comparison, UTF-8 is more suitable for network transmission and uses single-byte storage for ASCII characters. In addition, damage to a single character will not affect other subsequent characters. The encoding efficiency is between GBK and UTF-16. Therefore, UTF-8 balances encoding efficiency and encoding security and is an ideal Chinese encoding method.

Chinese garbled solution:

1. Tomcat’s built-in encoding is in the format of ISO-8859-1, right? Compatible with Chinese encoding. Use the same format to receive (ISO-8859-1), and then use a parsable encoding (utf-8) to convert. After processing, it will be sent to the front desk. When sending to the front desk, you need to set:

res.setContentType("text/html;charset=utf-8");//Set the character encoding of the page to solve the problem of garbled Chinese characters displayed on the interface;

2.req.setCharacterEncoding("utf-8");//Must be written in the first place, because data is read in this way, otherwise the data will be wrong.

3.Spring provides a CharacterEncodingFilter filter that can be used to solve the problem of garbled characters.

You need to pay attention to the following issues when using CharacterEncodingFilter:

The form data is submitted in POST mode;

Configure the CharacterEncodingFilter filter in web.xml

Page encoding The encoding specified by the filter should be consistent

CharacterEncodingFilter configuration example:



  encodingFilter
  
    org.springframework.web.filter.CharacterEncodingFilter
  
  
    encoding
    UTF-8
  


  encodingFilter
  /*

The above is written by myself When encountering problems during the coding process, I looked up the information and summarized it myself. This is what I know, and there should be solutions.

The above is the detailed content of Solving the problem of front and back garbled characters in JavaWEB. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn