Home  >  Article  >  Backend Development  >  PHP implements OCR text recognition

PHP implements OCR text recognition

WBOY
WBOYOriginal
2016-07-30 13:29:324408browse

More: http://www.webyang.net/Html/web/article_161.html

The Baidu definition of OCR (Optical Character Recognition, optical character recognition) refers to the inspection paper of electronic devices (such as scanners or digital cameras) The process of determining the shape of characters printed on the computer by detecting dark and light patterns, and then using character recognition methods to translate the shapes into computer text; that is, for printed characters, optical methods are used to convert the text in the paper document into A technology that converts black and white dot matrix image files into text format through recognition software for further editing and processing by word processing software.

As an engineer, in actual programming, you may need to display the text in the picture, which requires the use of OCR technology. Because of PHP development, I gave priority to PHP. I found PHP's OCR extension and tested it, but found that it was not available (address: http://sourceforge.net/projects/phpocr.berlios)? I have also watched many demos from friends on the Internet. The basic principle is to decompose the image into a matrix of 0 and 1, and then convert it into the corresponding string according to the characteristics. It is not feasible to test several. Then I saw others saying that PHP is rarely used for OCR and is not suitable. The language efficiency is too low. This algorithm requires high efficiency. You can try C, MATLAB and other OCR algorithms. There are many people working in matlab who play partial algorithms such as OCR.

But I have little talent and little knowledge, and I can’t do C. I accidentally discovered that Baidu has an OCR API provided: http://apistore.baidu.com/apiworks/servicedetail/146.html.

Written for fun:

<ol>
<li value="1">
<span><?</span><span>php</span>
</li>
<li>
<span>header</span><span>(</span><span>"Content-type: text/html; charset=utf-8"</span><span>);</span>
</li>
<li><span> </span></li>
<li>
<span>function</span><span> curl</span><span>(</span><span>$img</span><span>)</span><span></span><span>{</span>
</li>
<li><span> </span></li>
<li>
<span>    $ch  </span><span>=</span><span> curl_init</span><span>();</span>
</li>
<li>
<span>    $url </span><span>=</span><span></span><span>'http://apis.baidu.com/apistore/idlocr/ocr'</span><span>;</span><span></span><span>//百度ocr api</span>
</li>
<li>
<span>    $header </span><span>=</span><span> array</span><span>(</span>
</li>
<li>
<span></span><span>'Content-Type:application/x-www-form-urlencoded'</span><span>,</span>
</li>
<li>
<span></span><span>'apikey:69c2ace1ef297ce88869f0751cb1b618'</span><span>,</span>
</li>
<li>
<span></span><span>);</span>
</li>
<li><span> </span></li>
<li>
<span>    $data_temp </span><span>=</span><span> file_get_contents</span><span>(</span><span>$img</span><span>);</span>
</li>
<li>
<span>    $data_temp </span><span>=</span><span> urlencode</span><span>(</span><span>base64_encode</span><span>(</span><span>$data_temp</span><span>));</span>
</li>
<li>
<span></span><span>//封装必要参数</span>
</li>
<li>
<span>    $data </span><span>=</span><span></span><span>"fromdevice=pc&clientip=127.0.0.1&detecttype=LocateRecognize&languagetype=CHN_ENG&imagetype=1&image="</span><span>.</span><span>$data_temp</span><span>;</span>
</li>
<li><span></span></li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_HTTPHEADER </span><span>,</span><span> $header</span><span>);</span><span></span><span>// 添加apikey到header</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_POST</span><span>,</span><span></span><span>1</span><span>);</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_POSTFIELDS</span><span>,</span><span> $data</span><span>);</span><span></span><span>// 添加参数</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch</span><span>,</span><span> CURLOPT_RETURNTRANSFER</span><span>,</span><span></span><span>1</span><span>);</span>
</li>
<li>
<span>    curl_setopt</span><span>(</span><span>$ch </span><span>,</span><span> CURLOPT_URL </span><span>,</span><span> $url</span><span>);</span><span></span><span>// 执行HTTP请求</span>
</li>
<li>
<span>    $res </span><span>=</span><span> curl_exec</span><span>(</span><span>$ch</span><span>);</span>
</li>
<li>
<span></span><span>if</span><span></span><span>(</span><span>$res </span><span>===</span><span> FALSE</span><span>)</span><span></span><span>{</span>
</li>
<li>
<span>        echo </span><span>"cURL Error: "</span><span></span><span>.</span><span> curl_error</span><span>(</span><span>$ch</span><span>);</span>
</li>
<li>
<span></span><span>}</span>
</li>
<li>
<span>    curl_close</span><span>(</span><span>$ch</span><span>);</span>
</li>
<li><span></span></li>
<li>
<span>    $temp_var </span><span>=</span><span> json_decode</span><span>(</span><span>$res</span><span>,</span><span>true</span><span>);</span>
</li>
<li>
<span></span><span>return</span><span> $temp_var</span><span>;</span>
</li>
<li><span> </span></li>
<li><span>}</span></li>
<li><span> </span></li>
<li>
<span>$wordArr </span><span>=</span><span> curl</span><span>(</span><span>'4.jpg'</span><span>);</span>
</li>
<li>
<span>if</span><span>(</span><span>$wordArr</span><span>[</span><span>'errNum'</span><span>]</span><span></span><span>==</span><span></span><span>0</span><span>)</span><span></span><span>{</span>
</li>
<li>
<span>    var_dump</span><span>(</span><span>$wordArr</span><span>);</span>
</li>
<li>
<span>}</span><span></span><span>else</span><span></span><span>{</span>
</li>
<li>
<span>    echo </span><span>"识别出错:"</span><span>.</span><span>$wordArr</span><span>[</span><span>"errMsg"</span><span>];</span>
</li>
<li><span>}</span></li>
</ol>

Tested a few pictures and the accuracy is quite high. 100%, it is unrealistic~


Copyright statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission.

The above introduces the implementation of OCR text recognition in PHP, including aspects of the content. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn