Home  >  Article  >  Database  >  An introduction to the causes of MySQL garbled characters and how to set UTF8 data format

An introduction to the causes of MySQL garbled characters and how to set UTF8 data format

不言
不言forward
2019-03-27 10:05:312482browse

The content of this article is about the reasons for MySQL garbled code and the method of setting UTF8 data format. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

When using MySQL, one very painful thing must be that the result is garbled. Setting the encoding format to UTF8 can solve this problem. Today we will talk about why and how to set it up like this.

MySQL character format

Character set

In the programming language, in order to prevent Chinese garbled characters, we will use unicode to process Chinese characters, and In order to reduce network bandwidth and save storage space, we use UTF8 for encoding. Students who do not know enough about the differences between the two can refer to the article The Past and Present of Unicode Character Set and UTF8 Encoding.

Similarly in MySQL, we will also have such processing. We can check the encoding method (character set) of the current database setting:

mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | latin1                           | 
| character_set_connection | latin1                           | 
| character_set_database   | latin1                           | 
| character_set_filesystem | binary                           | 
| character_set_results    | latin1                           | 
| character_set_server     | latin1                           | 
| character_set_system     | utf8                             | 
| character_sets_dir       | /usr/local/mysql/share/charsets/ | 
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)

The table is the currently set character set. First There are several values ​​that you don’t need to pay attention to:

character_set_filesystem | binary: The storage format on the file system, the default is binary (binary)

character_set_system | utf8: The storage format of the system, the default is utf8

character_sets_dir | /usr/local/mysql/share/charsets/: The file path of the character sets that can be used

The remaining parameters are the parameters that affect the daily reading and writing of garbled characters:
- character_set_client: The character set used by the client to request data
- character_set_connection: The character set used to receive data from the client and then transmit it
- character_set_database: The character set of the default database; if there is no default database, use the character_set_server field
- character_set_results: Character set of the result set
- character_set_server: Default character set of the database server

An introduction to the causes of MySQL garbled characters and how to set UTF8 data format

The character set conversion process is divided into 3 steps:

1. The client requests database data, and the data sent uses the character_set_client character set

2. After the MySQL instance receives the data sent by the client, it converts it into the character_set_connection character set

3. When performing internal operations, convert the data character set to the internal operation character set:

(1) Use the character set setting value of each data field

(2) If it does not exist , use the default character set setting value of the corresponding data table

(3) If it does not exist, use the default character set setting value of the corresponding database

(4) If it does not exist, use character_set_server Setting value

4. Convert the operation result value from the internal operation character set to character_set_results

Character sequence

Before talking about character sequence, we need to understand some basic knowledge:

Character (Character) refers to the smallest ideographic symbol in human language. For example, 'A', 'B', etc.;

Given a series of characters, assign a value to each character, and use the value to represent the corresponding character. This value is the encoding of the character (Encoding ). For example, we assign the value 0 to the character 'A' and the value 1 to the character 'B', then 0 is the encoding of the character 'A';

given a series of characters and assigns the corresponding encoding Finally, the set of all these character and encoding pairs is the character set (Character Set). For example, when the given character list is {'A','B'}, {'A'=>0, 'B'=>1} is a character set;

characters Collation refers to the comparison rules between characters in the same character set;

After the character sequence is determined, what are equivalent characters in a character set can be defined, as well as the differences between characters. The size relationship;

Each character sequence uniquely corresponds to one character set, but one character set can correspond to multiple character sequences, one of which is the default character sequence (Default Collation);

The character sequence names in MySQL follow the naming convention: start with the character set name corresponding to the character sequence; start with _ci (case insensitive), _cs (case sensitive) or _bin (indicates comparison by encoding value, binary) at the end. For example: In the character sequence "utf8_general_ci", the characters "a" and "A" are equivalent;

Therefore, the character sequence is different from the character set and is used for equality or size comparison of database fields. Let’s check the character sequence set by the MySQL instance:

mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci | 
| collation_database   | latin1_swedish_ci | 
| collation_server     | latin1_swedish_ci | 
+----------------------+-------------------+
3 rows in set (0.00 sec)

The common character sequences corresponding to utf8 are: utf8_unicode_ci/utf8_general_ci and utf8_bin, etc. So what are their differences?

1. _bin is stored and compared in binary, with case distinction. Use

2 when storing binary content. utf8_general_ci: The proofreading speed is fast, but the accuracy is slightly poor. Use it when using Chinese and English.

3. utf8_unicode_ci: High accuracy, but the proofreading speed is slightly slow. Use

when using foreign languages ​​such as Germany, France, and Russia. For detailed differences, please refer to

The sorting rules utf8_unicode_ci and utf8_general_ci in Mysql Summary of differences.

Modify character set and character order

If the problem of garbled characters occurs when connecting to MySQL, it can basically be determined that the character set/order settings are not uniform. MySQL's default latin1 format does not support Chinese. Since we are in China, we chose the utf8 format, which has very complete support for Chinese and various languages. Therefore, we need to change the character set and character sequence that need attention to utf8 format.

You can also choose utf8mb4 format, which supports saving emoji

The above is the detailed content of An introduction to the causes of MySQL garbled characters and how to set UTF8 data format. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:segmentfault.com. If there is any infringement, please contact admin@php.cn delete