Thinkphp3.2 uses scws Chinese word segmentation to extract keywords, _PHP tutorial

PHP中文网
Release: 2016-07-12 09:06:51
Original
1199 people have browsed it

Thinkphp3.2 uses scws Chinese word segmentation to extract keywords,

SCWS is the acronym for Simple Chinese Word Segmentation (ie: Simple Chinese Word Segmentation System) .
1. Download the classes officially provided by scws (the fourth version of pscws is used here)
http://www.xunsearch.com/scws/down/pscws4-20081221.tar.bz2
Download XDB dictionary file (the utf8 simplified Chinese dictionary package is used here)
http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
2. Unzip the scws class Pscws.class.php (here I changed the pscws4.class.php file name to pscws.class.php) and XDB_R.class.php (here I changed the xdb_r.class.php file name to uppercase XDB_R.class .php) under the ThinkPHPLibraryOrgUtil directory.
3. Then modify Pscws.class.php
Add the namespace

1 namespace Org\Util;
Copy after login

Change the name of the class to Pscws

把require_once (dirname(__FILE__) . '/XBD_R.class.php');这段代码删除掉。
Copy after login

Modify XDB_R.class.php
Add namespace

namespace Org\Util;
Copy after login

4. Unzip the XDB dictionary file
Create a new dict folder in the Publicadmin directory, and then unzip the dict.utf8.xdb of the XDB dictionary file to the word directory. Then put rules.utf8.ini under etc in the scws class under this directory.
5. Add a line of constant definition code to the entry file (actually the path to define the dictionary file and configuration file)

define("CONF_PATH", dirname(__FILE__)."/Public/admin/dict/");
Copy after login

6. Create a private method in the IndexController.class.php controller, For other methods to call

/**
     * 中文分词  
         * @params string $title 需要分词的语句 
         * @params int $num  分词个数,默认不用填写
     **/
    private function get_tags($title,$num=null){        
        $pscws = new \Org\Util\Pscws('utf8');
        $pscws->set_dict(CONF_PATH . 'dict.utf8.xdb');
        $pscws->set_rule(CONF_PATH . 'rules.utf8.ini');
        $pscws->set_ignore(true);
        $pscws->send_text($title);
        $words = $pscws->get_tops($num);
        $pscws->close();
        $tags = array();
        foreach ($words as $val) {
            $tags[] = $val['word'];
        }
        return implode(',', $tags);
    }
      /**
     * 商品搜索结果页
     **/
    public function search(){
        $rzt=$this->get_tags("新款 牛漆皮小尖头直跟高跟单鞋910033 灰羊猄(7.31发货) 39");
        print_r($rzt);
    }
Copy after login

, the displayed result is:

漆皮,单鞋,尖头,高跟,新款,发货,910033,7.31,39
Copy after login


http://www.bkjia.com/PHPjc/1063515. htmlwww.bkjia.comtruehttp://www.bkjia.com/PHPjc/1063515.htmlTechArticleThinkphp3.2 uses scws Chinese word segmentation to extract keywords. SCWS is the abbreviation of Simple Chinese Word Segmentation (ie: Simple Chinese Word Segmentation System). 1. Download the classes officially provided by scws...


source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template