Home >Backend Development >PHP Tutorial >thinkphp5 uses the workerman timer to regularly crawl site content code

thinkphp5 uses the workerman timer to regularly crawl site content code

不言
不言Original
2018-07-26 09:19:454261browse

The content of this article is about thinkphp5 using the workerman timer to regularly crawl news information of a certain site. The content is very detailed. Friends in need can refer to it. I hope it can help you.

1. First install workererman through composer. There are detailed instructions in the extension of thinkphp5 complete development manual-"coposer package-"workerman:

#在项目根目录执行以下指令
composer require topthink/think-worker

2. Create the service startup file server.php in the project root directory:

<?php

define(&#39;APP_PATH&#39;, __DIR__ . &#39;/application/&#39;);
define("BIND_MODULE", "server/Worker");
// 加载框架引导文件
require __DIR__ . &#39;/thinkphp/start.php&#39;;

3. Create the server module in the application and create the controller Worker.php in the server:

<?php
namespace app\server\controller;
use think\worker\Server;

class Worker extends Server
{

    public function onWorkerStart($work)
    {
        $handle=new Collection();
        $handle->add_timer();
    }


}

4. Create the Collection.php class

<?php
namespace app\server\controller;
use app\common\model\ArticleModel;
use think\Controller;
use Workerman\Lib\Timer;

class Collection extends Controller{

	public function __construct(){
		  parent::__construct();
	}

	public function add_timer(){
        Timer::add(10, array($this, &#39;index&#39;), array(), true);//时间间隔过小,运行会崩溃
    }
    /**
     * 采集数据
     */

    public function index(){

       
        $total=$this->get_jinse();
        return json([&#39;msg&#39;=>"此次采集数据共 $total 条。",&#39;total&#39;=>$total]);
    }

  
    /**
     * 获取金色财经资讯
     */
    public function get_jinse(){
        $url="https://api.jinse.com/v4/live/list?limit=20";
        $data=$this->get_curl($url);
        $data=json_decode($data);
        $data=$data->list[0]->lives;

        $validate=validate(&#39;Article&#39;);
        $items=[];

        foreach ($data as $k=>$v){

            preg_match(&#39;/【(.+?)】(.+)/u&#39;,$v->content,$content);

            if(!@$content[2]){
                continue;
            }


            $list=array(
                &#39;source_id&#39;=>$v->id,
                &#39;source&#39;=>&#39;金色财经&#39;,
                &#39;title&#39;=>trim(preg_replace(&#39;/.*\|/&#39;,&#39;&#39;,$content[1])),
                &#39;content&#39;=>$content[2],
            );
            if($validate->check($list)){
                $items[]=$list;
            }
        }
        if($items){
            krsort($items);
            $model=new ArticleModel();
            $model->saveAll($items);
        }
        return count($items);
    }
    public function get_curl($url){
        $ch=curl_init();
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch,CURLOPT_URL,$url);
        curl_setopt($ch,CURLOPT_HEADER,0);
        curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
        $output = curl_exec($ch);

        if($output === FALSE ){
            echo "CURL Error:".curl_error($ch);
        }
        curl_close($ch);
        // 4. 释放curl句柄

        return $output;

    }
  
}

5. Start the service php server.php start

Related recommendations:

What is template inheritance in Thinkphp? Example of template inheritance

How to use PHP to verify user name and password (code)

The above is the detailed content of thinkphp5 uses the workerman timer to regularly crawl site content code. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn