php method to realize verification code recognition: first binarize the image and save the value into a two-dimensional array; then find the position of each number through a loop; then calculate the position of the number in the two-dimensional array position in the array, and concatenate the numbers; finally, compare and identify the string with the string of each font.
Recommended: "PHP Video Tutorial"
But the introduction in the original text is relatively simple and does not mention to the specific implementation process of the algorithm. The detailed process is reproduced from:
http://www.poboke.com/study/php-verification-code-identification-primary.html
So this article is based on a practical Example to demonstrate the process of PHP identifying the verification code and submitting the verification code to the server for verification.
Part One: Identification of Verification Codes
Recently researched some breakthroughs in verification code knowledge and recorded them. On the one hand, it is a summary of the knowledge learned in the past few days to help myself understand; on the other hand, I hope it will be helpful to technical students who are studying this aspect; on the other hand, I also hope to attract the attention of website administrators and take more into consideration when providing verification codes. Since I have just come into contact with this aspect of knowledge, my understanding is relatively simple, so mistakes are inevitable. Feel free to comment.
The role of the verification code: effectively prevent a hacker from making continuous login attempts to a specific registered user using a specific program to brute force. In fact, modern verification codes generally prevent machines from registering in batches and preventing machines from posting replies in batches. Currently, many websites use verification code technology to prevent users from using robots to automatically register, log in, and spam.
The so-called verification code is to generate a picture from a string of randomly generated numbers or symbols. Some interference pixels are added to the picture (to prevent OCR). The user can visually identify the verification code information and enter it into the form. Submit website verification, and a certain function can only be used after successful verification.
Our most common verification code:
1. Four digits, a random one-digit string, the most original verification code, and the verification effect is almost zero.
2. Random digital picture verification code. The characters on the picture are quite regular, some may have some random interferons added, and some have random character colors, so the verification effect is better than the previous one. People without basic knowledge of graphics and imagery cannot break it!
3. Random numbers in various image formats, random uppercase English letters, random interference pixels, and random positions.
4. Chinese characters are the latest verification code for registration. They are randomly generated, which makes it more difficult to type and affects the user experience. Therefore, it is generally used less often.
For the sake of simplicity, the main object of our explanation this time is the first type. Let’s first look at several common verification code pictures on the Internet.
These four styles can basically represent the types of verification codes mentioned in 2. Initially, it seems that the first picture is the easiest to crack, the second is the second, the third is more difficult, and the fourth is the easiest to crack. The most difficult one.
What’s the real situation? In fact, these three types of images are equally difficult to crack.
The first picture is the easiest. The background and numbers of the picture use the same color, the characters are regular and the characters are in the same position. This article uses this type of verification code as an example. Students can create other pictures by themselves.
The second picture seems not easy. In fact, if you study it carefully, you will find its rules. No matter how the background color and interferon change, the verification characters are regular and the same color, so it is very easy to eliminate interferon, as long as it is all non-character pigments. Just exclude it.
The third picture seems to be more complicated. In addition to the background color and interferon changing as mentioned above, the color of the verification characters is also changing, and the colors of each character are also different.
In the fourth picture, in addition to the features mentioned in the third picture, two straight lines of interference rate are added to the text. It seems difficult but is actually easy to remove.
The following uses Wanwang’s “General URL Query” to illustrate the verification code identification process.
Open Wanwang: http://www.net.cn, there is a "General URL Query" in the sidebar on the right side of the website:
It can be seen that this is the first A kind of verification code. In order for the human eye to recognize the numbers, the color difference between the digital color and the background color of the verification code picture is relatively large, so its RBG value is also very different. It can be distinguished by judging the RGB value of each pixel. Numbers and background.
Verification code identification is generally divided into the following steps:
1. Take out the font
Identification of the verification code, after all, I am not a professional OCR recognition, and since the verification codes of each website are different, the most common method is to build a signature library of this verification code. When removing the fonts, we need to download a few more pictures so that these pictures include all characters. The pictures here only have numbers, so we only need to collect pictures of numbers including 0-9.
1. Refresh the verification code several times and save the verification code pictures. Collect all pictures from 0-9.
2. Open the picture with a picture processing software. I use Fireworks. Hold down ctrl 8 to enlarge the view of the picture 8 times, so that you can clearly observe the picture. every pixel.
It can be found that the width of each number is 6px, the height is 10px, the interval between numbers is 4px, the first number is offset by 2px on the left, and the top is offset by 0px. These numbers will be used later.
3. Cut out each number and save it as a picture, the size is 6*10.
2. Binarization of the picture
Binarization is to represent each pixel on the verification number on the picture with the number 1, and other parts with 0 means. Binarize the image to be recognized and save the data into a two-dimensional array to obtain the image feature array.
1. First, distinguish the numbers from the background color and interference color, and use the screen color picker to observe the color pattern.
We can draw a conclusion: the R, G, and B values of the background color are all greater than 200, while one of the R, G, and B values of the digital color may be less than 200. Therefore it can be easily distinguished.
2. The following PHP code is just to demonstrate the two-dimensional array. In order to visually see the numbers, 1 and 0 are changed to 0 and -:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
echo '<br><img src="v1.jpg"><br><br>';
getHec("v1.jpg");
function getHec($imagePath) {
$res = imagecreatefromjpeg($imagePath);
$size = getimagesize($imagePath);
for ($i = 0; $i < $size[1]; $i) {
for ($j = 0; $j < $size[0]; $j) {
$rgb = imagecolorat($res, $j, $i);
$rgbarray = imagecolorsforindex($res, $rgb);
if ($rgbarray['red'] < 200 || $rgbarray['green']<200 || $rgbarray['blue'] < 200) {
echo "0";
}else{
echo " -";
}
}
echo "<br>";
}
}
|
The results are shown in the figure below:
If the background color of the picture is more complex, the processing method is the same. You can always find the critical value to distinguish. You have to observe it yourself.
3. Binarization of digital fonts
Calculate the binary data of each digital font, record these data, and use them as keys.
1. Binarize the digital font image from 0-9, take out the color of each pixel of the image one by one, then obtain the R, G, and B values of each pixel, and then make a judgment. The code is as follows :
#12
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
for($i=0;$i<10;$i ){
echo"'$i'=>'";
echogetHec("$i.jpg")."',<br>";
}
functiongetHec($imagePath){
$res=imagecreatefromjpeg($imagePath);
$size=getimagesize($imagePath);
for($i=0;$i<$size[1]; $i){
for($j=0;$j<$size[0]; $j){
$rgb=imagecolorat($res,$j,$i);
$rgbarray=imagecolorsforindex($res,$rgb);
if($rgbarray['red']<200||$rgbarray['green']<200||$rgbarray['blue']<200){
echo#"1";
# }else{
echo" 0";
}
}
}
}
|
Output result:
##123456
7
8
9
10
|
'0'=>'011110100001100001100001100001100001100001100001100001011110',
'1'=>'001000111000001000001000001000001000001000001000001000111110',
'2'=>'011110100001100001000001000010000100001000010000110011111111',
'3'=>'011110100001100001000010001100000010000001100001100001011110',
'4'=>'000100000100001100010100100100100100111111000100001100001111',
'5'=>'111111100000100000101110110001000001000001100001100001011110',
'6'=>'001110010001100000100000101110110001100001100001100001011110',
'7'=>'111111100010100010000100000100001000001000001000001000001000',
'8'=>'011110100001100001100001011110010010100001100001100001011110',
'9'=>' 011100100010100001100001100011011101000001000001100010011100',
|
##4. Control the sample to compare the image features in step 2 Compare the code with the font pattern of the verification code in step 3 to get the numbers on the verification picture.
Algorithm process (see attachment for code):
1. Save the binarized value of the image into a two-dimensional array.
2. Through looping, find the position of each number, using the width, height, spacing, left offset, and top offset of the previously obtained number.
For example: left offset of i-th number = (number width interval) * i left offset. (w h A string similar to a numeric glyph.
4. Compare the string with the string of each font to find the similarity. Take the number corresponding to the highest similarity, or you can conclude that it is a certain number when the similarity reaches more than 95%.
5. The recognition results are as follows:
Using the current method, the recognition of the verification code can basically be 100%.
Through the above steps, you may have said that you have not discovered how to remove interferon! In fact, the method to remove interferon is very simple. An important feature of interferon is that it cannot affect the display effect of the verification code, so when making interferon, its RGB may be lower or higher than a certain value, such as in the example I gave In the picture, the RGB values of interferon will not be less than 200, so we can easily remove interferon.
Source code download: http://yunpan.cn/cmJCkEnyGij3t
Access password d2ba
The above is the detailed content of How to implement verification code recognition in php. For more information, please follow other related articles on the PHP Chinese website!