Recently, a JavaScript script that can crack verification codes has appeared on the Internet - GreaseMonkey! This script developed by "Shaun Friedle" can easily solve the CAPTCHA of the Megaupload site. If you don’t believe it, you can try it yourself at http://herecomethelizards.co.uk/mu_captcha/!
Now, the CAPTCHA provided by the Megaupload site has been defeated by the above code. To be honest, the verification code here The design is not very good. But what’s more interesting is:
1. The Canvas application interface getImageData in HTML 5 can be used to obtain pixel data from the verification code image. Using Canvas, we can not only embed an image into a canvas, but also re-extract it from it later.
2. The above script contains a neural network implemented entirely in JavaScript.
3. After using Canvas to extract pixel data from the image, send it to the neural network, and use a simple optical character recognition technology to infer which characters are used in the verification code.
By reading the source code, we can not only better understand its working principle, but also understand how this verification code is implemented. As you saw earlier, the CAPTCHAs used here are not very complex - each CAPTCHA consists of three characters, each character uses a different color, and only uses characters from the 26-letter alphabet, while all characters All use the same font.
The purpose of the first step is obvious, that is, copy the verification code to the canvas and convert it into a grayscale image.
Function convert_grey(image_data){
for (var x = 0; 4+y*4*image_data.width; var luma = Math.floor(image_data.data[i] * 299/1000 +
image_data.data[i+1] * 587/1000 +
image_data.data[i+ 2] * 114/1000);
Image_data.data[i] = luma;
image_data.data[i+1] = luma;
image_data.data[i+2] = luma;
image_data.data[i+3 ] = 255;
}
}
}
Then, divide the canvas into three separate pixel matrices, each containing one character. This step is very easy to achieve because each character uses a separate color, so it can be distinguished by color.
filter(image_data[0], 105);
filter(image_data[1], 120);
function filter(image_data, colour){
for (var x = 0 ; x < image_data.width; the pixels of the certain color to white
if (image_data.data[i] == colour) {
image_data.data[i] = 255;
image_data.data[i+1] = 255;
image_data.data[i +2] = 255;
// Everything else to black
} else {
image_data.data[i] = 0;
image_data.data[i+1] = 0;
image_data.data[i+2] = 0 ;
}
}
}
}
Finally, all irrelevant interfering pixels are eliminated. To do this, you can first find those white (matched) pixels that are surrounded by black (unmatched) pixels in front or behind, and then delete the matched pixels.
var i = x*4+y*4*image_data.width; var above = x*4+(y-1)*4*image_data.width; var below = x*4+(y+1) *4*image_data.width;
if (image_data.data[i] == 255 &&
image_data.data[below] == 0) {
image_data.data[i] = 0;
image_data.data[i+2] = 0;
}
Now we have got an approximate shape of the character, but before loading it into the neural network Before that, the script will further perform necessary edge detection on it. The script will look for the leftmost, right, top, and bottom pixels of the graphic, convert them into a rectangle, and then re-convert the rectangle into a 20*25 pixel matrix.
cropped_canvas.getContext("2d").fillRect(0, 0, 20, 25);
var edges = find_edges(image_data[i]);
cropped_canvas.getContext("2d").drawImage(canvas, edges[ 0], edges[1], edges[2]-edges[0], edges[3]-edges[1], 0, 0, edges[2]-edges[0], edges[3]-edges [1]);
image_data[i] = cropped_canvas.getContext("2d").getImageData(0, 0,
Then, this rectangle will be further simplified. We strategically extract points from the matrix as "photoreceptors" that will be fed into the neural network. For example, a certain photoreceptor may correspond to a pixel located at 9*6, with or without pixels. The script extracts a series of such states (far fewer than the entire 20x25 matrix calculation - only 64 states are extracted) and feeds these states into the neural network.
You may ask, why not compare pixels directly? Is it necessary to use a neural network? The key to the problem is that we need to remove those ambiguous situations. If you've tried the previous demo, you'll notice that comparing pixels directly is more error-prone than comparing through a neural network, although it doesn't happen much. But we have to admit that for most users, direct pixel comparison should be enough.
The next step is to try to guess the letters. 64 Boolean values (obtained from one of the character images) are imported into the neural network, as well as a series of pre-calculated data. One of the concepts of neural networks is that the results we want to obtain are known in advance, so we can train the neural network accordingly based on the results. The script author can run the script multiple times and collect a series of best scores that can help the neural network guess the answer by working backwards from the values that produced them, but these scores have no special meaning.
When the neural network calculates the 64 Boolean values corresponding to a letter in the verification code, compares it with a pre-calculated alphabet, and then gives a score for the match with each letter. (The final result may be similar: 98% may be the letter A, 36% may be the letter B, etc.)
When all three letters in the verification code have been processed, the final result will come out. It should be noted that this script is not 100% correct (I wonder if the accuracy of the scoring can be improved if the letters are not converted into rectangles at the beginning), but it is pretty good, at least for current purposes. Say so. And all operations are completed in the browser based on standard client technology!
As a side note, this script should be considered a special case. This technology may work well in other simple verifications. code, but for complex verification codes, it is a bit beyond the reach (especially this kind of client-based analysis). I hope more people can be inspired by this project and develop more wonderful things, because its potential is so great