Using Apache Lucene for full-text search processing in Java API development-Java Tutorial-php.cn

Using Apache Lucene for full-text search processing in Java API development

WBOY

Release： 2023-06-18 18:11:05

original

1363 people have browsed it

As the amount of Internet data continues to increase, how to search data quickly and accurately has become an important issue. In response to this problem, full-text search engines emerged. Apache Lucene is one of the open source full-text search engine libraries, suitable for applications integrated with the Java programming language. This article will introduce how to use Apache Lucene for full-text search processing in Java API development.

1. Introduction to Apache Lucene

Apache Lucene is a full-text search engine library. It is a high-performance, full-featured, easy-to-use search engine library based on Java. It can index large amounts of text data and provide efficient, accurate and fast retrieval results. Lucene uses disk-based indexing technology to split text data into multiple words and then store them in an inverted index table. The inverted index table uses the relationship between words and documents to point words to the document where the word is located. During the query process, the inverted index table searches documents by word and returns them as query results.

2. The core components of Lucene

Lucene is composed of multiple core components. These components work together to implement a high-performance full-text search engine, including:

Analyzer

Anaylzer is used to split text data into multiple In addition to dividing text into words, the word analyzer can also be used to filter stop words, perform case conversion, etc.

IndexWriter (index writer)

IndexWriter is used to convert text data into an index table, build an inverted index table, and persist it to disk . When data needs to be searched, the data can be quickly looked up from the index table.

IndexReader (Index Reader)

IndexReader is used to read the index table from disk and load it into memory. Data is loaded from memory, so queries of the data are very fast.

Query (Query)

Query is used to convert the string entered by the user into search conditions and quickly find data in the Lucene index table.

3. Use Lucene to implement full-text search

Introducing Lucene dependencies

Maven is a commonly used dependency management tool in Java development. We just need to add the following Lucene dependencies in Maven:


  org.apache.lucene
  lucene-core
  8.8.2

Copy after login

Create index

Use IndexWriter to convert the data into an index table. Here we assume that the data being searched comes from a database or other source. We need to convert it to text form and add it to the IndexWriter. The following is an article example:

import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import java.nio.file.Paths;

public class Indexer {

    private IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new SmartChineseAnalyzer());
    private IndexWriter indexWriter;

    public Indexer(String indexPath) {
        try {
            Directory directory = FSDirectory.open(Paths.get(indexPath));
            indexWriter = new IndexWriter(directory, indexWriterConfig);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void add(String field, String value) {
        try {
            Document doc = new Document();
            FieldType fieldType = new FieldType();
            fieldType.setIndexOptions(FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            fieldType.setStored(true);
            fieldType.setTokenized(true);
            doc.add(new Field(field, value, fieldType));
            indexWriter.addDocument(doc);
            indexWriter.commit();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void delete(String field, String value) {
        try {
            indexWriter.deleteDocuments(new Term(field, value));
            indexWriter.commit();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void close() {
        try {
            indexWriter.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

Copy after login

In this class:

In the Indexer constructor, we initialize the IndexWriter and Directory. Directory represents the location of the index library.
add() method is used to add text data to the index library.
delete() method is used to delete text data from the index library.
close() method is used to finally close the IndexWriter.

Use Query and IndexReader for search operations. The following is a code example:

import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

public class Searcher {

    private String[] fields = new String[] {"title", "content"};
    private Query query;
    private IndexReader indexReader;
    private IndexSearcher indexSearcher;

    public Searcher(String indexPath) {
        try {
            Directory directory = FSDirectory.open(Paths.get(indexPath));
            indexReader = DirectoryReader.open(directory);
            indexSearcher = new IndexSearcher(indexReader);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private Query getQuery(String keyword) {
        try {
            if (query == null) {
                query = new MultiFieldQueryParser(fields, new SmartChineseAnalyzer()).parse(keyword);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return query;
    }

    public List search(String keyword) {
        List result = new ArrayList();
        try {
            TopDocs topDocs = indexSearcher.search(getQuery(keyword), 10);
            ScoreDoc[] scoreDocs = topDocs.scoreDocs;
            for (ScoreDoc scoreDoc : scoreDocs) {
                result.add(indexSearcher.doc(scoreDoc.doc).get("title"));
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return result;
    }

    public void close() {
        try {
            indexReader.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

Copy after login

In this class:

In the Searcher constructor, we initialize IndexReader and IndexSearcher.
The getQuery() method is used to convert the search conditions entered by the user into Query type.
The search() method is used for searching and returns the results after performing the search operation.
close() method is used to finally close the IndexReader.

4. Summary

This article introduces how to implement the full-text search function through Apache Lucene, mainly involving the core components of Lucene, the usage of Lucene and the methods of some common classes in Lucene . In addition to the classes and methods covered in this article, there are many other functions in Lucene that can be appropriately adjusted and used according to different needs. Apache Lucene is a very reliable full-text search engine library in the Java language, suitable for many fields. Through learning and practice, I believe that everyone can better use Apache Lucene in practical applications to achieve efficient, accurate, and fast search functions.

The above is the detailed content of Using Apache Lucene for full-text search processing in Java API development. For more information, please follow other related articles on the PHP Chinese website!