How does php distinguish between Simplified Chinese, Traditional Chinese, Japanese and Korean
According to the methods given on the Internet, it seems that Chinese, Japanese, and Korean can be distinguished, but how to distinguish simplified and traditional Chinese?
$s = <<<'EOF'
"memolov 爱书 愛書 あいしょ 사랑 때문에 책이 되다",
EOF;
echo $s.PHP_EOL;
if(preg_match_all('/([\x{4e00}-\x{9fa5}]+)/u',$s,$m)){ //中文简体繁体
echo "<pre>";
print_r($m[1]);
echo "</pre>";
}
if(preg_match_all('/([\x{0800}-\x{4e00}]+)/u',$s,$m)){ //日文
echo "<pre>";
print_r($m[1]);
echo "</pre>";
}
if(preg_match_all('/([\x{AC00}-\x{D7A3}]+)/u',$s,$m)){ //韩文
echo "<pre>";
print_r($m[1]);
echo "</pre>";
}
Then here comes the problem
小
There is no traditional Chinese for this. So is this considered simplified or traditional?This is simplified and traditional. . It's not easy to distinguish. Can you build a library corresponding to Simplified and Traditional Chinese?
I have a simple idea:
First convert Chinese into Simplified Chinese. If the string does not change before and after conversion, it is Simplified Chinese, otherwise it is counted as Traditional Chinese.
https://github.com/BYVoid/OpenCC
OpenCC library, used for conversion, very easy to use. Others can also be used.