Home >Backend Development >PHP Tutorial >The difference between strlen, mb_strlen, substr(), mb_substr() and mb_strcut in php_PHP tutorial
The article introduces in detail the differences and usage of strlen, mb_strlen, substr(), mb_substr() and mb_strcut. Students who need to learn can refer to it.
About the use of the string splitting function of mb_*:
Configuration under win
Need to install php_mbstring.dll extension
You need to open php_mbstring.dll in php.ini
The configuration under Linux can be easily searched online
The code is as follows | Copy code | ||||
$str='Chinese character a'; echo strlen($str).' ';//14
echo mb_strlen($str,'utf8').' ';//8
echo mb_strlen($str,'gb2312').'
|
Result analysis: When calculating strlen, a UTF8 Chinese character is treated as 3 lengths, so the length of "Chinese a character 1 character" is 3*4+2=14. When calculating mb_strlen, the internal code is selected as UTF8, then A Chinese character will be calculated as a length of 1, so the length of "Chinese a character 1 character" is 6
mb_strlen default encoding can be passed
Obtained by mb_internal_encoding().
1 echo (strlen($str) + mb_strlen($str,'UTF8')) / 2;
PHP’s built-in string length function strlen cannot correctly handle Chinese strings. It only gets the number of bytes occupied by the string. For GB2312 Chinese encoding, the value obtained by strlen is twice the number of Chinese characters, while for UTF-8 encoded Chinese, the difference is three times (under UTF-8 encoding, one Chinese character occupies 3 bytes).
String splitting
The substr() function can split text, but if the text to be split includes Chinese characters, you will often encounter problems. In this case, you can use the mb_substr()/mb_strcut function代码如下 | 复制代码 |
echo mb_substr('这样一来我的字符串就不会有乱码^_^', 0, 7, 'utf-8'); ?> |
mb_substr splits characters by words, while mb_strcut splits characters by bytes, but neither will produce half a character.
代码如下 | 复制代码 |
echo mb_strcut('这样一来我的字符串就不会有乱码^_^', 0, 7, 'utf-8'); ?> |
The code is as follows | Copy code |
echo mb_substr('This way my string will not be garbled^_^', 0, 7, 'utf-8');<🎜> ?> |
The code is as follows | Copy code |
echo mb_strcut('This way my string will not be garbled^_^', 0, 7, 'utf-8');<🎜> ?> |
Output: like this
As can be seen from the above example, mb_substr splits characters by words, while mb_strcut splits characters by bytes, but neither of them will produce half a character...
Description of mbstring function:
PHP's mbstring extension module provides multi-byte character processing capabilities. The most commonly used method is to use mbstring to split multi-byte Chinese characters. This can avoid the occurrence of half characters. Since it is an extension of PHP, its The performance is also better than some custom multi-byte segmentation functions.
The mbstring extension provides several functions with similar functions, mb_substr and mb_strcut. See their explanation in the manual.
mb_substr
mb_substr() returns the portion of str specified by the start and length parameters.
mb_substr() performs multi-byte safe substr() operation based on number of characters. Position is sqlserver/42852.htm target=_blank >counted from the beginning of str. First character's position is 0. Second character position is 1 , and so on.
mb_strcut
mb_strcut() returns the portion of str specified by the start and length parameters.
mb_strcut() performs equivalent operation as mb_substr() with different method. If start position is multi-byte character's second byte or larger, it starts from first byte of multi-byte character.
It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence.
For another example, there is a piece of text that is segmented using mb_substr and mb_strcut respectively:
PLAIN TEXT
CODE:
The code is as follows
|
Copy code
|
||||||||
$str = 'I am a relatively long string of Chinese-www.webjx.com'; echo "mb_substr:" . mb_substr($str, 0, 6, 'utf-8');
"; echo "mb_strcut:" . mb_strcut($str, 0, 6, 'utf-8'); ?> The output results are as follows: mb_substr: I am a string of comparisons mb_strcut:I am Test code:
Statement: The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn Previous article:Detailed explanation of regular expression for determining email address in PHP_PHP TutorialNext article:Detailed explanation of regular expression for determining email address in PHP_PHP Tutorial Related articlesSee more |