The example in this article describes how PHP implements the method of judging spam comments through the ratio of Chinese characters. Share it with everyone for your reference. The specific implementation method is as follows:
1. Demand:
This type of spam comments have often appeared in recent times: a large paragraph of English characters mixed with one or two rare Chinese characters, including Chinese characters, and it does not contain any Chinese sensitive words, so it passed the comment filter openly. The processing of such comments can be confirmed by judging the ratio of Chinese characters, but there will also be certain misjudgments.
2. Solution:
You need to use the two functions strlen and mb_strlen of PHP. strlen will identify the length of a single Chinese character as 3, and mb_strlen will identify the length of a single Chinese character as 1. The difference between the lengths of the same character segment obtained by the two functions is twice the actual number of Chinese characters. Divide by two to get the actual number of characters. Compute the ratio with the length obtained by mb_strlen to get the ratio of Chinese characters to the total number of characters.
3. Implementation code:
I hope this article will be helpful to everyone’s PHP programming design.