PHP tutorial: Blank problem in web page UTF8 encoding development

PHP tutorial: Blank problem in web page UTF8 encoding development_PHP tutorial

WBOY

Release： 2016-07-21 14:56:49

Original

723 people have browsed it

A problem that has never been solved in development
The page is encoded in UTF8, and the template includes files in the head and tail. As a result, there is an extra space of about 10px in the head and tail. OK, nothing.
The reason is that all UTF8 encoding is used. When including files, the final binary stream contains multiple UTF8 BOM tags. IE cannot parse pages containing multiple UTF8 BOM tags normally and directly replaces them with the actual displayed carriage return. This results in a blank line, but Firefox does not have this problem.
Therefore, if the template uses the inclusion method to contain multiple utf8 files and needs to be saved with ultraedit, select the save as function and select utf8 without BOM format to save.
In addition, if the Chinese page puts the title tag in front of in the html head tag, it will cause the page to blank.
So utf8 pages should use the standard order

< meta name=”description” content=”” />

BOM header: xEFxBBxBF, PHP4 and 5 still ignore BOM, so they are output directly before parsing.
There is a dedicated description of this issue in the w3.org standard FAQ:

http://www.w3.org/International/questions/qa-utf8-bom

The details are as follows:

There is a character called "ZERO WIDTH NO-BREAK SPACE" in UCS encoding, and its encoding is FEFF. FFFE is a character that does not exist in UCS, so it should not appear in actual transmission. The UCS specification recommends that we transmit the characters "ZERO WIDTH NO-BREAK SPACE" before transmitting the byte stream. In this way, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. Therefore, the character "ZERO WIDTH NO-BREAK SPACE" is also called BOM.

UTF-8 does not require a BOM to indicate the byte order, but can use the BOM to indicate the encoding method. The UTF-8 encoding of the character "ZERO WIDTH NO-BREAK SPACE" is EF BB BF. So if the receiver receives a byte stream starting with EF BB BF, it knows that it is UTF-8 encoded.

Windows is an operating system that uses BOM to mark the encoding method of text files: WindowsXP Professional, default character set: Chinese

1) Notepad: It can automatically identify UTF-8 encoded format files without BOM, but it cannot control whether to add BOM when saving the file. If the file is saved, BOM will be added uniformly.

2) editplus: cannot automatically recognize UTF-8 encoding format files without BOM. When saving the file, select UTF-8 format and will not write BOM header in the file header.

3) UltraEdit: The most powerful function for character encoding, it can automatically identify utf-8 files with and without bom (can be configured); when saving, you can choose whether to add bom through configuration.

(It is important to note that when saving a newly created file, you need to choose to save it as utf-8 no bom format)

Later I discovered that Notepad ++ also has better support for utf-8 BOM, and I recommend everyone to use it.