PHP cuts Chinese characters without garbled characters-PHP Tutorial-php.cn

PHP cuts Chinese characters without garbled characters

王林

Release： 2023-04-07 12:14:01

Original

4030 people have browsed it

PHP cuts Chinese characters without garbled characters

In PHP, if the substr() function intercepts a Chinese string, garbled characters may appear. This is because the number of bytes occupied by one byte in Chinese and Western characters is different. The length parameter of substr is calculated in bytes. In GB2312 encoding, one Chinese character occupies 2 bytes and English occupies 1 byte. In UTF-8 encoding, one Chinese character may occupy 2 or 3 bytes. Bytes, English or half-width punctuation occupies 1 byte.

Directly using the PHP function substr to intercept Chinese characters may cause garbled characters. The main reason is that substr may forcibly "saw" a Chinese character in half. Solution:

1. Use the mb_substr interception of the mbstring extension library to avoid garbled characters.

2. Write the interception function yourself, but the efficiency is not as high as using the mbstring extension library.

3. If it is just to output the intercepted string, it can be implemented in the following way: substr($str, 0, 30).chr(0).

substr()The function can split text, but if the text to be split includes Chinese characters, you will often encounter problems. In this case, you can use mb_substr()/mb_strcutThe usage of this function, mb_substr()/mb_strcut is similar to substr(), except that one more parameter is added at the end of mb_substr()/mb_strcut to set the encoding of the string, but generally The server has not opened php_mbstring.dll. You need to open php_mbstring.dll in php.ini.

For example:

<?php
echo mb_substr(&#39;这样一来我的字符串就不会有乱码^_^&#39;, 0, 7, &#39;utf-8&#39;);
?>
输出：这样一来我的字
<?php
echo mb_strcut(&#39;这样一来我的字符串就不会有乱码^_^&#39;, 0, 7, &#39;utf-8&#39;);
?>

Copy after login

Output: Like this

As can be seen from the above example, mb_substr divides characters by words, while mb_strcut divides characters by bytes To segment characters, but it will not produce half a character.

PHP method to intercept Chinese string without garbled characters

function GBsubstr($string, $start, $length) {
if(strlen($string)>$length){
  $str=null;
  $len=$start+$length;
  for($i=$start;$i<$len;$i++){
  if(ord(substr($string,$i,1))>0xa0){
   $str.=substr($string,$i,2);
   $i++;
  }else{
   $str.=substr($string,$i,1);
  }
  }
  return $str.&#39;...&#39;;
}else{
  return $string;
}
}

Copy after login

Method to implement Chinese string interception without garbled characters--applicable to utf- 8

function substr_text($str, $start=0, $length, $charset="utf-8", $suffix="")
{
if(function_exists("mb_substr")){
return mb_substr($str, $start, $length, $charset).$suffix;
}
elseif(function_exists(&#39;iconv_substr&#39;)){
return iconv_substr($str,$start,$length,$charset).$suffix;
}
$re[&#39;utf-8&#39;] = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/";
$re[&#39;gb2312&#39;] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/";
$re[&#39;gbk&#39;]  = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/";
$re[&#39;big5&#39;]  = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/";
preg_match_all($re[$charset], $str, $match);
$slice = join("",array_slice($match[0], $start, $length));
return $slice.$suffix;
}

Copy after login

Recommended tutorial: PHP video tutorial

The above is the detailed content of PHP cuts Chinese characters without garbled characters. For more information, please follow other related articles on the PHP Chinese website!