Simple testing and usage sharing of php-ml-PHP Tutorial-php.cn

php-ml is a machine learning library written in PHP. Although we know that python or C++ provide more machine learning libraries, in fact, most of them are slightly complicated, and many novices feel hopeless when configuring them. This article mainly brings you a simple test and usage method of the PHP machine learning library php-ml. The editor thinks it is quite good, so I will share it with you now and give it as a reference for everyone. Let’s follow the editor to take a look, I hope it can help everyone.

php-ml Although this machine learning library does not have particularly advanced algorithms, it has the most basic machine learning, classification and other algorithms. Our small company can do some simple data analysis, prediction, etc. Enough. In our projects, what we pursue should be cost-effectiveness, not excessive efficiency and precision. Some algorithms and libraries look very powerful, but if we consider going online quickly and our technical staff have no experience in machine learning, complex code and configuration will actually drag down our project. And if we are making a simple machine learning application, then the learning cost of studying complex libraries and algorithms is obviously a bit high. Moreover, if the project encounters strange problems, can we solve them? What should I do if my needs change? I believe everyone has had this experience: while working, the program suddenly reported an error, and I couldn't figure out the reason. I searched on Google or Baidu and found only one question that met the conditions. It was asked five or ten years ago. , and then zero reply. . .

Therefore, it is necessary to choose the simplest, most efficient and most cost-effective method. The speed of php-ml is not slow (change to php7 quickly), and the accuracy is also good. After all, the algorithms are the same, and php is based on c. What bloggers dislike the most is comparing the performance and scope of application between Python, Java and PHP. If you really want performance, please develop in C. If you really want to pursue the scope of application, please use C or even assembly. . .

First of all, if we want to use this library, we need to download it first. This library file can be downloaded from github (https://github.com/php-ai/php-ml). Of course, it is more recommended to use composer to download the library and configure it automatically.

After downloading, we can take a look at the documentation of this library. The documents are some simple examples. We can create a file ourselves and try it out. All are easy to understand. Next, let's test it on actual data. One of the data sets is the data set of Iris stamens, and the other is due to the loss of records, so I don’t know what the data is about. . .

Iris stamen data has three different categories:

Unknown data set, the decimal point is marked as a comma, so when calculating Still need to process:

# Let’s deal with the unknown data set first. First, the file name of our unknown dataset is data.txt. This data set can just be drawn into an x-y line chart first. Therefore, we first draw the original data into a line chart. Since the x-axis is relatively long, we only need to see its rough shape:

The jpgraph library of php is used for drawing. The code is as follows:

<?php
include_once &#39;./src/jpgraph.php&#39;;
include_once &#39;./src/jpgraph_line.php&#39;;

$g = new Graph(1920,1080);//jpgraph的绘制操作
$g->SetScale("textint");
$g->title->Set('data');

//文件的处理
$file = fopen('data.txt','r');
$labels = array();
while(!feof($file)){
 $data = explode(' ',fgets($file));  
 $data[1] = str_replace(',','.',$data[1]);//数据处理，将数据中的逗号修正为小数点
 $labels[(int)$data[0]] = (float)$data[1];//这里将数据以键值的方式存入数组，方便我们根据键来排序
} 

ksort($labels);//按键的大小排序

$x = array();//x轴的表示数据
$y = array();//y轴的表示数据
foreach($labels as $key=>$value){
 array_push($x,$key);
 array_push($y,$value);
}


$linePlot = new LinePlot($y);
$g->xaxis->SetTickLabels($x); 
$linePlot->SetLegend('data');
$g->Add($linePlot);
$g->Stroke();

Copy after login

With this original image for comparison, we will study next. We use LeastSquars in php-ml for learning. The output of our test needs to be saved in a file so that we can draw a comparison chart. The learning code is as follows:

<?php
 require &#39;vendor/autoload.php&#39;;

 use Phpml\Regression\LeastSquares;
 use Phpml\ModelManager;

 $file = fopen(&#39;data.txt&#39;,&#39;r&#39;);
 $samples = array();
 $labels = array();
 $i = 0;
 while(!feof($file)){
  $data = explode(&#39; &#39;,fgets($file));
  $samples[$i][0] = (int)$data[0];
  $data[1] = str_replace(&#39;,&#39;,&#39;.&#39;,$data[1]);
  $labels[$i] = (float)$data[1];
  $i ++;
 } 
 fclose($file);

 $regression = new LeastSquares();
 $regression->train($samples,$labels);

 //这个a数组是根据我们对原数据处理后的x值给出的，做测试用。
 $a = [0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,22,23,24,25,26,27,29,30,31,37,40,41,45,48,53,55,57,60,61,108,124];
 for($i = 0; $i < count($a); $i ++){
  file_put_contents("putput.txt",($regression->predict([$a[$i]]))."\n",FILE_APPEND); //以追加的方式存入文件  
 }

Copy after login

After that, we will read out the data stored in the file, draw a graph, and first paste the final rendering:

Code As follows:

<?php
include_once &#39;./src/jpgraph.php&#39;;
include_once &#39;./src/jpgraph_line.php&#39;;

$g = new Graph(1920,1080);
$g->SetScale("textint");
$g->title->Set('data');

$file = fopen('putput.txt','r');
$y = array();
$i = 0;
while(!feof($file)){
 $y[$i] = (float)(fgets($file));
 $i ++;   
} 

$x = [0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,22,23,24,25,26,27,29,30,31,37,40,41,45,48,53,55,57,60,61,108,124];

$linePlot = new LinePlot($y);
$g->xaxis->SetTickLabels($x); 
$linePlot->SetLegend('data');
$g->Add($linePlot);
$g->Stroke();

Copy after login

It can be found that the graphics discrepancy is still relatively large, especially in the parts with more jagged graphics. However, this is 40 sets of data after all, and we can see that the general graphic trends are consistent. When general libraries do this kind of learning, the accuracy is very low when the amount of data is low. To achieve relatively high accuracy, a large amount of data is required, and more than 10,000 pieces of data are necessary. If this data requirement cannot be met, then any library we use will be in vain. Therefore, in the practice of machine learning, the real difficulty is not technical problems such as low accuracy and complex configuration, but insufficient data volume or too low quality (too much useless data in a set of data). Before doing machine learning, pre-processing of data is also necessary.

Next, let’s test the stamen data. There are three categories in total. Since we downloaded csv data, we can use the official method of operating csv files provided by php-ml. This is a classification problem, so we choose the SVC algorithm provided by the library for classification. We set the file name of the stamen data as Iris.csv, and the code is as follows:

<?php
require &#39;vendor/autoload.php&#39;;

use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;
use Phpml\Dataset\CsvDataset;

$dataset = new CsvDataset(&#39;Iris.csv&#39; , 4, false);
$classifier = new SVC(Kernel::LINEAR,$cost = 1000);
$classifier->train($dataset->getSamples(),$dataset->getTargets());

echo $classifier->predict([$argv[1],$argv[2],$argv[3],$argv[4]]);//$argv是命令行参数，调试这种程序使用命令行较方便

Copy after login

是不是很简单？短短12行代码就搞定了。接下来，我们来测试一下。根据我们上面贴出的图，当我们输入5 3.3 1.4 0.2的时候，输出应该是Iris-setosa。我们看一下：

看，至少我们输入一个原来就有的数据，得到了正确的结果。但是，我们输入原数据集中没有的数据呢？我们来测试两组：

由我们之前贴出的两张图的数据看，我们输入的数据在数据集中并不存在，但分类按照我们初步的观察来看，是合理的。

所以，这个机器学习库对于大多数的人来说，都是够用的。而大多数鄙视这个库鄙视那个库，大谈性能的人，基本上也不是什么大牛。真正的大牛已经忙着捞钱去了，或者正在做学术研究等等。我们更多的应该是掌握算法，了解其中的道理和玄机，而不是夸夸其谈。当然，这个库并不建议用在大型项目上，只推荐小型项目或者个人项目等。

jpgraph只依赖GD库，所以下载引用之后就可以使用，大量的代码都放在了绘制图形和初期的数据处理上。由于库的出色封装，学习代码并不复杂。需要所有代码或者测试数据集的小伙伴可以留言或者私信等，我提供完整的代码，解压即用。