When PHP intercepts Chinese strings, it generally determines whether it is a double-byte character based on whether the value is greater than or equal to 128 to avoid incomplete interception and garbled characters.
However, when encountering a situation where Chinese and English are mixed and special symbols are included, the problem is not so easy to solve.
The following is a function that comprehensively solves the problem of Chinese string interception. Friends in need can refer to it.
Explanation:
1. The len parameter is based on Chinese characters. 1len is equal to 2 English characters. In order to make the format look better
2. If the magic parameter is set to false, Chinese and English will be treated equally, and the absolute number of characters will be taken.
3. Especially suitable for strings encoded with htmlspecialchars()
4. Can correctly handle the entity character mode in GB2312 (??)
Example:
/**
@Intercept Chinese string suitable for GB2312 encoding
@http://www.jbxue.com
*/
function FSubstr($title,$start,$len="",$magic=true)
{
$length = 0;
if($len == "") $len = strlen($title);
/ /Judge the starting position as incorrect
if($start > 0)
{
$cnum = 0;
for($i=0;$i<$start;$i++)
{
if(ord(substr ($title,$i,1)) >= 128) $cnum ++;
}
if($cnum%2 != 0) $start--;
unset($cnum);
}
if(strlen($title)<=$len) return substr($title,$start,$len);
$alen = 0;
$blen = 0;
$realnum = 0;
for( $i=$start;$i
$ctype = 0;
$cstep = 0;
$cur = substr($title,$i,1);
if( $cur == "&")
{
if(substr($title,$i,4) == "<")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,4) == ">")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($ title,$i,5) == "&")
{
$cstep = 5;
$length += 5;
$i += 4;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) == """)
{
$cstep = 6;
$length += 6;
$i += 5 ;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) == "'")
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(preg_match("/& #(d+);/i",substr($title,$i,8),$match))
{
$cstep = strlen($match[0]);
$length += strlen($match[0 ]);
$i += strlen($match[0])-1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}
}else{
if(ord($cur)>=128)
{
$cstep = 2;
$length += 2;
$i += 1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}else{
$cstep = 1;
$length +=1;
$realnum ++;
if($magic)
{
$alen++ ;
}
}
}
if($magic)
{
if(($blen*2+$alen) == ($len*2)) break;
if(($blen*2+$alen ) == ($len*2+1))
{
if($ctype == 1)
{
$length -= $cstep;
break;
}else{
break;
}
}
}else {
if($realnum == $len) break;
}
}
unset($cur);
unset($alen);
unset($blen);
unset($realnum);
unset($ ctype);
unset($cstep);
return substr($title,$start,$length);
}
?>