PHP automatically recognizes character set encoding and completes transcoding

PHP automatically recognizes character set encoding and completes transcoding_PHP tutorial

WBOY

Release： 2016-07-13 10:48:59

Original

899 people have browsed it

The principle is very simple, because gb2312/gbk is Chinese two bytes, these two bytes have a value range, while Chinese characters in utf-8 are three bytes, and each byte also has a value range. Regardless of the encoding situation, English is less than 128 and only takes up one byte (except full-width)

When PHP processes pages, we use functions such as iconv or mb_convert to convert character sets. However, this actually has a premise. That is, we must know in advance what encoding in and out are so that we can perform the correct conversion.
The following function can automatically determine the encoding of the source string and convert it without knowing its encoding. Although it only supports UTF8 encoding and GB2312 encoding, it is enough for most domestic websites.

The code is as follows

Copy code

代码如下

复制代码

function safeEncoding($string,$outEncoding = 'UTF-8')
{
    $encoding = "UTF-8";
    for($i=0;$i<128)
            continue;

        if((ord($string{$i})&224)==224)
        {
            //第一个字节判断通过
            $char = $string{++$i};
            if((ord($char)&128)==128)
            {
                //第二个字节判断通过
                $char = $string{++$i};
                if((ord($char)&128)==128)
                {
                    $encoding = "UTF-8";
                    break;
                }
            }
        }
        if((ord($string{$i})&192)==192)
        {
            //第一个字节判断通过
            $char = $string{++$i};
            if((ord($char)&128)==128)
            {
                //第二个字节判断通过
                $encoding = "GB2312";
                break;
            }
        }
    }

    if(strtoupper($encoding) == strtoupper($outEncoding))
        return $string;
    else
        return iconv($encoding,$outEncoding,$string);
}

Example 2

The code is as follows

Copy code

//Identify Chinese character encoding, because YBlog uses utf-8, if the citation notification is sent with gb2312 encoding, it needs to be able to identify and complete the encoding conversion
Function safeEncoding($string,$outEncoding = 'UTF-8')
{
         $encoding = "UTF-8";
for($i=0;$i                                                                                    If(ord($string{$i})<128)
Continue;

If((ord($string{$i})&224)==224)
                                                                     //The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
                                                                              //The second byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
                                                                                              $encoding = "UTF-8";
                               break;                                                                                                                                                                                                                                                                                                                                                                                                      If((ord($string{$i})&192)==192)
                                                                     //The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
                                                                            //The second byte passed
                         $encoding = "GB2312";
break;
                                                                                                                                                                                                                                                                                                                                                                               If(strtoupper($encoding) == strtoupper($outEncoding))
                     return $string;                                            else
                return iconv($encoding,$outEncoding,$string);
}