1. Foreword

1. The motivation for studying coreseek

  I have my own note-taking blog, where I often do some analysis of technical articles. When querying some articles, in the past, I could only use like fuzzy matching in mysql to query the content. When there are too many articles, this method is definitely not efficient. So I set my sights on the Chinese search plug-in coreseek and successfully used it in my project.


I hope that through this analysis, interested students will avoid some detours.

2, concept introduction

 Sphinx is an open source search engine that supports full-text search in English. However, the natural word segmentation symbol in English is a space, while Chinese has more complex word segmentation requirements. The Chinese provide a Chinese full-text search engine based on Sphinx that can be used by enterprises. In other words, the actual core of Coreseek is still Sphinx. But the biggest difference is that coreseek has a Chinese word segmentation tool mmseg.

3,Environment introduction

System: Ubuntu

 http service: Apache/2.2.22

 Mysql:Ver 14.14 Distrib 5.5.41

PHP: PHP 5.3.10

2. Download and install Coreseek

Installation steps

Download coreseek-3.2.14.tar.gz and place it in /usr/local/src

First of all, in order to avoid missing dependency packages during installation, you need to make up for it

apt-get install make gcc g++ automake libtool mysql-client libmysqlclient15-dev libxml2-dev libexpat1-dev

Just execute the above command, otherwise various strange problems may occur because the software package is not new enough. For example, I updated the 159M software package. (I went back and replenished my blood after encountering various pitfalls)

1, install mmseg word segmentation module

cd /usr/local/src

tar zxvf coreseek-3.2.14.tar.gz #解压

cd coreseek-3.2.14

cd mmseg-3.2.14

./bootstrap #输出的warning信息可以忽略,如果出现error则需要解决

./configure --prefix=/usr/local/mmseg3 #配置

make #编译

make install #安装

1.1) Possible problems and solutions:
The error ./bootstrap: 27: ./bootstrap: autoconf: not found occurs when executing ./bootstrap,

Reason: Because the automake tool is not installed, (ubuntu 10.04) just install it with the following command.

sudo apt-get install autoconf automake libtool

1.2) Possible problems: When installing the word segmentation module mmseg, and at the end of the compilation and installation, an error of annot find input file: src/Makefile.in appears
Then I checked and found the solution, as follows:

aclocal // is a perl script program, its definition is: "aclocal - create aclocal.m4 by scanning configure.ac"

libtoolize --force //运行后有一个错误,不用管它。
automake --add-missing
make clean

Then recompile
./configure --prefix=/usr/local/mmseg3
make && make install
Compilation and installation successful

Summary: In fact, I didn’t find out the reason for this error. Anyway, I succeeded according to the solution. If anyone knows, please leave a message, thank you.

2. Install CoreSeek

cd /usr/local/src

cd coreseek-3.2.14

cd csft-3.2.14

sh buildconf.sh    #输出的warning信息可以忽略,如果出现error则需要解决

./configure --prefix=/usr/local/coreseek  --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql  #配置

make  #编译
make install  #安装

3, test mmseg word segmentation, coreseek search, MySQL data source

cd /usr/local/src

cd coreseek-3.2.14

cd testpack

cat /usr/local/src/coreseek-3.2.14/testpack/var/test/test.xml #Chinese should be displayed correctly at this time, as shown in the figure below

/usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc /usr/local/src/coreseek-3.2.14/testpack/var/test/test.xml

/usr/local/coreseek/bin/indexer -c /usr/local/src/coreseek-3.2.14/testpack/etc/csft.conf --all

/usr/local/coreseek/bin/search -c /usr/local/src/coreseek-3.2.14/testpack/etc/csft.conf Network search

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx-min.conf.dist

/usr/local/coreseek/bin/indexer -c /usr/local/src/coreseek-3.2.14/testpack/etc/csft.conf --all --rotate #Start the service and update the index

As shown in the picture, and no error is reported. It means that your coreseek is already running normally.

3.1) Possible problems and solutions:

When typing /usr/local/coreseek/bin/indexer -c etc/csft.conf --all, xmlpipe2 support NOT compiled in. To use xmlpipe2, install missing error is reported


The xmlpipe2 library is missing, solution:

apt-get install expat-*

Then recompile coreseek, remember to make clean

4, coreseek configuration and usage

cp  /usr/local/src/coreseek-3.2.14/testpack/etc/csft_mysql.conf  /usr/local/coreseek/etc/csft_mysql.conf  #拷贝MySQL数据源配置文件

ln -s  /usr/local/coreseek/etc/csft_mysql.conf  /etc/csft_mysql.conf  #添加软连接

vim  /etc/csft_mysql.conf  #编辑,修改

 三, 修改Coreseek配置文件   



source mysql
    type                    = mysql
    sql_host                = localhost
    sql_user                =xxxx
    sql_pass                =xxxx
    sql_db                    =xxxx
    sql_port                = 3306
    sql_query_pre            = SET NAMES utf8
    sql_query                = SELECT id,id,uid,title,data FROM notebook_notepad
    sql_attr_uint            = id           #从SQL读取到的值必须为整数
    #sql_attr_timestamp        = time  #从SQL读取到的值必须为整数,作为时间属性
    sql_attr_uint   = uid

    sql_query_info_pre      = SET NAMES utf8                                        #命令行查询时,设置正确的字符集
    sql_query_info            = SELECT * FROM notebook_notepad WHERE id=$id #命令行查询时,从数据库读取原始数据信息

index mysql
    source            = mysql             #对应的source名称
    path            =/usr/local/coreseek/var/data/mysql #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...
    docinfo            = extern
    mlock            = 0
    morphology        = none
    min_word_len        = 1
    html_strip                = 0

    charset_dictpath = /usr/local/mmseg3/etc/ #BSD、Linux环境下设置,/符号结尾
    #charset_dictpath = etc/                             #Windows环境下设置,/符号结尾,最好给出绝对路径,例如:C:/usr/local/coreseek/etc/...
    charset_type        = zh_cn.utf-8

    mem_limit            = 128M

    listen                  =   9312
    read_timeout        = 5
    max_children        = 30
    max_matches            = 1000
    seamless_rotate        = 0
    preopen_indexes        = 0
    unlink_old            = 1
    pid_file =/usr/local/coreseek/var/log/searchd_mysql.pid  #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...
    log =/usr/local/coreseek/var/log/searchd_mysql.log        #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...
    query_log =/usr/local/coreseek/var/log/query_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/...


OK,配置完成以后,重启Coreseek 服务就能从生产你想要的查询索引,以后你就可以摆脱mysql的桎梏,什么中文,英文都可以,还自带分词。怎么样,是不是打开了新世界的大门。

下面讲一下重建索引可能出错的地方,以及解决办法。有兴趣的同学看一下,否则可以跳到下一节:PHP测试Coreseek 。

重建索引时报错误:WARNING: failed to open pid_file '/usr/local/coreseek/var/log/searchd_mysql.pid'.

/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop 停止服务

/usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf 启动服务

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all 建立索引

如果提示:FATAL: failed to lock /usr/local/coreseek/var/data/xxxx.spl: Resource temporarily unavailable, will not index. Try --rotate option.

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate 重建索引


 四, PHP测试Coreseek   


cp  /usr/local/src/coreseek-3.2.14/testpack/api/sphinxapi.php ./

vim test.php

    header("Content-type: text/html; charset=utf-8");
    //require("./    ");
    $s = new SphinxClient;
    $s->setServer("", 9312);
    //SPH_MATCH_ALL, 匹配所有查询词(默认模式); SPH_MATCH_ANY, 匹配查询词中的任意一个; SPH_MATCH_EXTENDED2, 支持特殊运算符查询
    $s->setMaxQueryTime(30);                                        //设置最大搜索时间
    $s->SetArrayResult(false);                                        //是否将Matches的key用ID代替
    $s->SetSelect ( "*" );                                            //设置返回信息的内容,等同于SQL
    $s->SetLimits ( 0, 30, 1000, 0 );                                //设置结果集偏移量  SetLimits 
    $res = $s->query('coreseek','mysql','--single-0-query--');         #[coreseek]关键字,[mysql]数据源source
    $err = $s->GetLastError();
    echo '
    echo '

php5 test.php



 五, Coreseek在Thinkphp中的使用   




[第一步] 安装依赖libsphinxclient

# cd /var/install/coreseek-4.1-beta/csft-4.1/api/libsphinxclient/
# ./configure  --prefix=/usr/local/sphinxclient

configure: creating ./config.status
config.status: creating Makefile
config.status: error: cannot find input file: Makefile.in   #报错configure失败    

编译过程中报了一个config.status: error: cannot find input file: src/Makefile.in这个的错误,然后运行下列指令再次编译就能通过了:
# aclocal
# libtoolize --force
# automake --add-missing
# autoconf
# autoheader
# make clean

# ./configure

# make && make install

[第二步] 安装sphinx的PHP扩展

# wget http://pecl.php.net/get/sphinx-1.3.0.tgz
# tar zxvf sphinx-1.3.0.tgz
# cd sphinx-1.3.0
# phpize
# ./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/sphinxclient
# make && make install
# cd /etc/php.d/
# cp gd.ini  sphinx.ini
# vi sphinx.ini


# service php-fpm restart

 安装完PHP的Sphinx扩展后,就可以直接使用$coreseek = new SphinxClient();而无需引入源文件了。


2,然后$sql = "select * from post where id in($ids)";$res = mysql_query($sql);获取到数据库的真实数据


     $where = array();
        if(!empty($search)){    //有需要查找的内容,则去 coreseek 忠查出对应的id
            $coreseek = new \SphinxClient();
            $coreseek->setServer("", 9312);
            //SPH_MATCH_ALL, 匹配所有查询词(默认模式); SPH_MATCH_ANY, 匹配查询词中的任意一个; SPH_MATCH_EXTENDED2, 支持特殊运算符查询
            $coreseek->setMaxQueryTime(30);                        //设置最大搜索时间
            $coreseek->SetArrayResult(false);                    //是否将Matches的key用ID代替
            $coreseek->SetSelect ( "*" );                        //设置返回信息的内容,等同于SQL
            $coreseek->SetLimits ( 0, 30, 1000, 0 );            //设置结果集偏移量  SetLimits
            $res = $coreseek->query($search,'mysql','--single-0-query--');
            $key = array_keys($res['matches']);


            $opt = array("before_match"=>"","after_match"=>"");
            $coreseek1 = new \SphinxClient();
            $coreseek1->setServer("", 9312);
            foreach($info as $key=>$row){
            foreach($info as $key=>&$row){



