How does php distinguish between Simplified Chinese, Traditional Chinese, Japanese and Korean
According to the methods given on the Internet, it seems that Chinese, Japanese, and Korean can be distinguished, but how to distinguish simplified and traditional Chinese?
$s = <<<'EOF'
"memolov 爱书 愛書 あいしょ 사랑 때문에 책이 되다",
EOF;
echo $s.PHP_EOL;
if(preg_match_all('/([\x{4e00}-\x{9fa5}]+)/u',$s,$m)){ //中文简体繁体
echo "<pre>";
print_r($m[1]);
echo "</pre>";
}
if(preg_match_all('/([\x{0800}-\x{4e00}]+)/u',$s,$m)){ //日文
echo "<pre>";
print_r($m[1]);
echo "</pre>";
}
if(preg_match_all('/([\x{AC00}-\x{D7A3}]+)/u',$s,$m)){ //韩文
echo "<pre>";
print_r($m[1]);
echo "</pre>";
}
那么问题来了
小
这个可没有繁体。那么这个算简体还是繁体的?简繁这个。。并不好区分吧。可以建个简繁对应库?
我有一个简单的思路:
先把中文统一转换成简体,如果转换前后的字符串没有改变,那就是简体,否则算作是繁体。
https://github.com/BYVoid/OpenCC
OpenCC库,用来转换的,很好用。也可以用其他的。