Home>Article>Backend Development> Method to detect and delete page BOM (UTF-8) blank lines_PHP tutorial

Method to detect and delete page BOM (UTF-8) blank lines_PHP tutorial

WBOY Original: 2016-07-13 10:49:14 995browse

We often find that there are some extra blank lines in the page for no reason, but we see it in the editor. We know that this is caused by BOM (UTF-8). Let me share with you some of them. Methods for detecting and deleting BOM (UTF-8).

The picture below is the HTML code seen with firebug after the situation mentioned above occurs.

Figure 1

There is an extra blank line inexplicably, but when we look at the source code, it is not there.

My most common method is to use php to replace

BOM: Universal code file signature BOM (Byte Order Mark, U+FEFF)

The content of the BOM can indicate which encoding UNICODE is, but the received file needs to be disassembled and written into the DB. Seeing the BOM feels a bit ooxx.

In utf8_encode, I saw two programs that can be used to test writing/removing BOM.

Add BOM before the written file content

The code is as follows

Copy code

代码如下	复制代码
function writeUTF8File($filename,$content) { $f = fopen($filename, 'w'); fwrite($f, pack("CCC", 0xef,0xbb,0xbf)); fwrite($f,$content); fclose($f); } ?>

function writeUTF8File($filename,$content)

{

代码如下	复制代码
function removeBOM($str = '') { if (substr($str, 0,3) == pack("CCC",0xef,0xbb,0xbf)) { $str = substr($str, 3); } return $str; } ?>

$f = fopen($filename, 'w');

fwrite($f, pack("CCC", 0xef,0xbb,0xbf));

fwrite($f,$content);

fclose($f);
}
?>

代码如下	复制代码
function isUTF8($string) { return (utf8_encode(utf8_decode($string)) == $string); }

Remove BOM function

The code is as follows	Copy code
function removeBOM($str = '')<> {<> if (substr($str, 0,3) == pack("CCC",0xef,0xbb,0xbf)) {<> $str = substr($str, 3);<> }<> Return $str;<> }<> ?>

Thus, the above BOM = pack("CCC",0xef,0xbb,0xbf), so the way to remove BOM can be written with the above removeBOM function or one of the following: ■str_replace("锘�", '', $bom_content); ■preg_replace("/^锘�/", '', $bom_content); Also see the function to determine whether this string is UTF-8:

The code is as follows	Copy code
function isUTF8($string) { Return (utf8_encode(utf8_decode($string)) == $string); }

Use shell in linux system to solve the problem

Before discussing in detail the detection and deletion of BOM in UTF-8 encoding, you might as well warm up with an example:

代码如下	复制代码
shell> curl -s http://www.bKjia.c0m/ \| head -1 \| sed -n l 锘�//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> $

As shown above, the first three bytes are 357, 273, and 277 respectively, which is the octal BOM.

代码如下

复制代码

As shown above, the first three bytes are EF, BB, and BF, which is the hexadecimal BOM. Note: The page of a third-party website is used, and there is no guarantee that the example will always be available. When actually doing project development, you may face hundreds or thousands of text files. If a few files are mixed with BOM, it will be difficult to detect. If there is no UTF-8 text file with BOM, you can use vi to make up a few. The relevant commands are as follows:

Set UTF-8 encoding:

代码如下	复制代码
:set fileencoding=utf-8

Add BOM:

代码如下	复制代码
:set bomb

Delete BOM:

代码如下	复制代码
:set nobomb

Query BOM:

代码如下	复制代码
:set bomb?

How to detect BOM in UTF-8 encoding?

The code is as follows

Copy code

代码如下

复制代码

shell> grep -r -I -l $'^锘�' /path如何删除UTF-8编码中的BOM呢？

shell> grep -r -I -l $'^锘�' /path | xargs sed -i 's/^锘�//;q'

shell> grep -r -I -l $'^锘�' /path How to delete the BOM in UTF-8 encoding?

shell> grep -r -I -l $'^锘�' /path | xargs sed -i 's/^锘�//;q'

代码如下

复制代码

#!/bin/bash

REPOS=""
TXN=""

SVNLOOK=/usr/bin/svnlook

for FILE in $($SVNLOOK changed -t "$TXN" "$REPOS" | awk '/^[AU]/ {print $NF}'); do
if $SVNLOOK cat -t "$TXN" "$REPOS" "$FILE" | grep -q $'^锘�'; then
echo "Byte Order Mark be found in $FILE" 1>&2
exit 1
fi
done

Recommendation: If you use SVN, you can add relevant code to the pre-commit hook to eliminate BOM.

The code is as follows

Copy code

#!/bin/bashREPOS="$1"

TXN="$2"

SVNLOOK=/usr/bin/svnlook

for FILE in $($SVNLOOK changed -t "$TXN" "$REPOS" | awk '/^[AU]/ {print $NF}'); doIf $SVNLOOK cat -t "$TXN" "$REPOS" "$FILE" | grep -q $'^锘�'; then
echo "Byte Order Mark be found in $FILE" 1>&2
exit 1
fi

done

This article uses a lot of shell commands

http://www.bkjia.com/PHPjc/632732.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/632732.htmlTechArticleWe often find that there are some blank lines in the page for no reason, but we see them again in the editor. , we know that this is caused by BOM (UTF-8), the editor will share with you some of the following...

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：PHP query ip location (province, city)_PHP tutorial Next article：PHP query ip location (province, city)_PHP tutorial

See more

Method to detect and delete page BOM (UTF-8) blank lines_PHP tutorial

Related articles