UTF-8 Chinese character regular expression-PHP Tutorial-php.cn

UTF-8 Chinese character regular expression

WBOY

Release： 2016-08-08 09:19:13

Original

1043 people have browsed it

Original link: http://blog.csdn.net/wide288/article/details/30066639

$str = "Programming";
// if(!preg_match("/^[x{4e00}-x{9fa5 }A-Za-z0-9_]+$/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression
if(!preg_match("/^[x{4e00}-x{9fa5} ]+$/u",$str)) //UTF-8 Chinese character alphanumeric underline regular expression
{ ;/font>";
}
else
{
echo "The [".$str."] you entered is completely legal and passed!"; -----------------------

UTF-8 matching:

In javascript, it is very simple to determine whether a string is Chinese. For example: var str = "php programming"; if (/^[u4e00-u9fa5]+$/.test(str)) { alert("This string is all in Chinese"); } else{ alert("This string Not all are in Chinese"); }

In php, x is used to represent hexadecimal data. Therefore, it is transformed into the following code: $str = "php programming"; if (preg_match("/^[x4e00-x9fa5]+$/",$str)) { print("This string is all in Chinese"); } else { print("Not all of the string is in Chinese"); } It seems that the error is no longer reported, and the judgment result is correct. However, if $str is replaced with the word "programming", the result still shows "Not all of the string is in Chinese". It's Chinese." It seems that this judgment is still not accurate enough.

Important: After checking "Proficient in Regular Expressions", I found that for [x4e00-x9fa5], I made a strengthened explanation myself

In PHP's regular expressions, [x4e00-x9fa5] is actually a combination of characters and character groups The concept, x{hex}, expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits, but if it is 4 digits, curly brackets must be added,

At the same time, if It is a hex greater than x{FF} and must be used with the u modifier, otherwise an illegal error will occur
You can only find regular rules for matching full-width characters on the Internet: ^[x80-xff]*^/ , you can not add curly brackets here [u4e00- u9fa5] can match Chinese, but PHP does not support it. However, since the hexadecimal data represented by x, why is it different from the range x4e00-x9fa5 provided in js? So I changed to the code below and found that it was really accurate: $str = "php programming"; if (preg_match("/^[x{4e00}-x{9fa5}]+$/u",$str )) { print("This string is all Chinese"); } else { print("This string is not all Chinese"); }
I know the final result of using regular expressions to match Chinese characters under UTF-8 encoding in PHP Correct expression - /^[x{4e00}-x{9fa5}]+$/u, refer to the above article to write the following test code (copy the following code and save it as a .php file)

GBK:
preg_match("/^[".chr(0xa1)."-".chr( 0xff)."A-Za-z0-9_]+$/",$str); //GB2312 Chinese character alphanumeric underline regular expression

The above has introduced UTF-8 Chinese character regular expressions, including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.