elasticsearch-jieba-plugin

20221201更新

新增分支:

7.17.x分支，支持es 7.17.0，JDK版本：11.0.7, gradle版本：7.6
8.4.1分支，支持es 8.4.1，JDK版本：18.0.2.1, gradle版本：7.6

当适配不同的ES版本，以及JDK版本，需要参考ES和JDK版本的对应关系
适配不同ES版本,修改以下文件，需要修改的地方已经注明

build.gradle
src/main/resources/plugin-descriptor.properties

需要切换不同的gradle版本,7.6是要切换的目标版本

gradle wrapper --gradle-version 7.6

jieba analysis plugin for elasticsearch: 7.7.0, 7.4.2, 7.3.0, 7.0.0, 6.4.0, 6.0.0 , 5.4.0, 5.3.0, 5.2.2, 5.2.1, 5.2.0, 5.1.2, 5.1.1

特点

支持动态添加字典，不重启ES。

新分词支持

thulac分词ES插件， thulac官网

如果是ES6.4.0的版本，请使用6.4.0分支最新的代码，或者master分支最新代码，也可以下载6.4.1的release，强烈推荐升级！

6.4.1的release，解决了PositionIncrement问题。详细说明见ES分词PositionIncrement解析

版本对应

分支	tag	elasticsearch版本	Release Link
7.7.0	tag v7.7.1	v7.7.0	Download: v7.7.0
7.4.2	tag v7.4.2	v7.4.2	Download: v7.4.2
7.3.0	tag v7.3.0	v7.3.0	Download: v7.3.0
7.0.0	tag v7.0.0	v7.0.0	Download: v7.0.0
6.4.0	tag v6.4.1	v6.4.0	Download: v6.4.1
6.4.0	tag v6.4.0	v6.4.0	Download: v6.4.0
6.0.0	tag v6.0.0	v6.0.0	Download: v6.0.1
5.4.0	tag v5.4.0	v5.4.0	Download: v5.4.0
5.3.0	tag v5.3.0	v5.3.0	Download: v5.3.0
5.2.2	tag v5.2.2	v5.2.2	Download: v5.2.2
5.2.1	tag v5.2.1	v5.2.1	Download: v5.2.1
5.2	tag v5.2.0	v5.2.0	Download: v5.2.0
5.1.2	tag v5.1.2	v5.1.2	Download: v5.1.2
5.1.1	tag v5.1.1	v5.1.1	Download: v5.1.1

more details

choose right version source code.
run

git clone https://github.com/sing1ee/elasticsearch-jieba-plugin.git --recursive
./gradlew clean pz

copy the zip file to plugin directory

cp build/distributions/elasticsearch-jieba-plugin-5.1.2.zip ${path.home}/plugins

unzip and rm zip file

unzip elasticsearch-jieba-plugin-5.1.2.zip
rm elasticsearch-jieba-plugin-5.1.2.zip

start elasticsearch

./bin/elasticsearch

Custom User Dict

Just put you dict file with suffix .dict into ${path.home}/plugins/jieba/dic. Your dict file should like this:

小清新 3
百搭 3
显瘦 3
隨身碟 100
your_word word_freq

Using stopwords

find stopwords.txt in ${path.home}/plugins/jieba/dic.
create folder named stopwords under ${path.home}/config

mkdir -p {path.home}/config/stopwords

copy stopwords.txt into the folder just created

cp ${path.home}/plugins/jieba/dic/stopwords.txt {path.home}/config/stopwords

create index:

PUT http://localhost:9200/jieba_index

{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords/stopwords.txt"
        },
        "jieba_synonym": {
          "type":        "synonym",
          "synonyms_path": "synonyms/synonyms.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop",
            "jieba_synonym"
          ]
        }
      }
    }
  }
}

test analyzer:

PUT http://localhost:9200/jieba_index/_analyze
{
  "analyzer" : "my_ana",
  "text" : "黄河之水天上来"
}

Response as follow:

{
    "tokens": [
        {
            "token": "黄河",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "黄河之水天上来",
            "start_offset": 0,
            "end_offset": 7,
            "type": "word",
            "position": 0
        },
        {
            "token": "之水",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "天上",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 2
        },
        {
            "token": "上来",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 2
        }
    ]
}

NOTE

migrate from jieba-solr

Roadmap

I will add more analyzer support:

stanford chinese analyzer
fudan nlp analyzer
...

If you have some ideas, you should create an issue. Then, we will do it together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

elasticsearch-jieba-plugin

20221201更新

特点

简单的修改，即可适配不同版本的ES

支持动态添加字典，ES不需要重启

有关jieba_index和jieba_search的应用

新分词支持

如果是ES6.4.0的版本，请使用6.4.0分支最新的代码，或者master分支最新代码，也可以下载6.4.1的release，强烈推荐升级！

6.4.1的release，解决了PositionIncrement问题。详细说明见ES分词PositionIncrement解析

版本对应

more details

Custom User Dict

Using stopwords

NOTE

Roadmap

Files

README.md

Latest commit

History

README.md

File metadata and controls

elasticsearch-jieba-plugin

20221201更新

特点

简单的修改，即可适配不同版本的ES

支持动态添加字典，ES不需要重启

有关jieba_index和jieba_search的应用

新分词支持

如果是ES6.4.0的版本，请使用6.4.0分支最新的代码，或者master分支最新代码，也可以下载6.4.1的release，强烈推荐升级！

6.4.1的release，解决了PositionIncrement问题。详细说明见ES分词PositionIncrement解析

版本对应

more details

Custom User Dict

Using stopwords

NOTE

Roadmap