Senzing with ElasticSearch

Overview

This code project demonstrates how the G2 engine may be used with an ElasticSearch indexing engine. ElasticSearch provides enhanced searching capabilities on entity data.

The G2 data repository contains data records and observations about known entities. It determines which records match/merge to become single resolved entities. These resolved entities can be indexed through the ElasticSearch engine, to provide more searchable data entities.

ElasticSearch stores its indexed entity data in a separate data repository than the G2 engine does. Thus, ElasticSearch and G2 must both be managed in order to keep them in sync.

Preamble

At Senzing, we strive to create GitHub documentation in a "don't make me think" style. For the most part, instructions are copy and paste. Whenever thinking is needed, it's marked with a "thinking" icon 🤔. Whenever customization is needed, it's marked with a "pencil" icon ✏️. If the instructions are not clear, please let us know by opening a new Documentation issue describing where we can improve. Now on with the show...

Legend

🤔 - A "thinker" icon means that a little extra thinking may be required. Perhaps there are some choices to be made. Perhaps it's an optional step.
✏️ - A "pencil" icon means that the instructions may need modification before performing.
⚠️ - A "warning" icon means that something tricky is happening, so pay attention.

Expectations

Space: This repository and demonstration require X GB free disk space.
Time: Budget 30 minutes to get the demonstration up-and-running, depending on CPU and network speeds.
Background knowledge: This repository assumes a working knowledge of:
- Docker
- Elasticsearch
- git
- kibana

Prerequisites

Demonstration

Load Data

🤔 Data needs to be loaded into a Senzing project to post to elasticsearch, if you don't have any data to load, or don't know how, visit our quickstart.

Startup elasticsearch

Start an instance of elasticsearch and your favorite elastic search UI, kibana is recommended and will be assumed for the remainder of this demonstration. For guidance on how to get an instance of ES and kibana running visit our doc on How to Bring Up an ELK Stack.

Build project

✏️ Set local environment variables. These variables may be modified, but do not need to be modified. The variables are used throughout the installation procedure.

export GIT_ACCOUNT=senzing
export GIT_REPOSITORY=elasticsearch
export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git
export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}"

Clone the repository

cd ${GIT_ACCOUNT_DIR}
git clone https://github.com/Senzing/elasticsearch.git
cd ${GIT_REPOSITORY_DIR}

🤔 Make sure the SENZING_ENGINE_CONFIGURATION_JSON environment variable is set to the Senzing installation that the data was loaded into earlier
🤔 Set elasticsearch local environment variables. The hostname and port must point towards the exposed port that the elasticsearch instance has. The index name can be anything; conforming to elasticsearch's index syntax.
```
export ELASTIC_HOSTNAME=senzing-elasticsearch
export ELASTIC_PORT=9200
export ELASTIC_INDEX_NAME=g2index
```

Build the docker container.

cd {GIT_REPOSITORY_DIR}
sudo docker build -t senzing/elasticsearch .

Run the indexer

Using a local sqlite Senzing database

We will mount the sqlite database; make sure the CONNECTION string in our config json points to where it is mounted. In this example the CONNECTION will need to point towards the /db dir. We also need to run the container as part of the network that the ELK-stack is running in. Example:

sudo --preserve-env docker run \
  --interactive \
  --rm \
  --tty \
  -e ELASTIC_HOSTNAME \
  -e ELASTIC_PORT \
  -e ELASTIC_INDEX_NAME \
  -e SENZING_ENGINE_CONFIGURATION_JSON \
  --network=senzing-network \
  --volume ~/senzing/var/sqlite:/db \
  senzing/elasticsearch

Using an external Senzing database

Here we won't need to mount a database, instead we can set our CONNECTION string in the config json to where the external database is. Example:

export SENZING_ENGINE_CONFIGURATION_JSON='{
"PIPELINE": {
    "CONFIGPATH": "/etc/opt/senzing",
    "RESOURCEPATH": "/opt/senzing/g2/resources",
    "SUPPORTPATH": "/opt/senzing/data"
   },
"SQL": {
    "CONNECTION": "postgresql://postgres:postgres@senzing-postgres:5432:G2"
   }
  }'

Now we can run the container as part of the network that the ELK-stack is running in so that it can "see" the elasticsearch container. Example:

sudo --preserve-env docker run \
  --interactive \
  --rm \
  --tty \
  -e ELASTIC_HOSTNAME \
  -e ELASTIC_PORT \
  -e ELASTIC_INDEX_NAME \
  -e SENZING_ENGINE_CONFIGURATION_JSON \
  --network=senzing-network \
  senzing/elasticsearch

Search data

Open up kibana in a web browser, default: localhost:5601
Navigate to the discover tab

Create Index.
- If all was done correctly, a new screen with a button to "Create data view" should appear.
- Click this and in the index pattern box type the name of the index that was created, this was the ELASTIC_INDEX_NAME variable set early, and should also appear on the right side of the popup.
- The Name field can be set but is not required.
Press "Save data view to Kibana" at the bottom of the screen, now can view the created index and do searches. If fuzzy searches are needed click on "Saved Query" and switch the language to lucene. Here you can view the lucene syntax and how to do fuzzy searches

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github		.github
.vscode		.vscode
elasticsearch		elasticsearch
.gitignore		.gitignore
.project		.project
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Senzing with ElasticSearch

Overview

Preamble

Legend

Expectations

Prerequisites

Demonstration

Load Data

Startup elasticsearch

Build project

Run the indexer

Using a local sqlite Senzing database

Using an external Senzing database

Search data

About

Uh oh!

Releases 4

Uh oh!

Contributors 7

Uh oh!

Languages

License

Senzing/elasticsearch

Folders and files

Latest commit

History

Repository files navigation

Senzing with ElasticSearch

Overview

Preamble

Legend

Expectations

Prerequisites

Demonstration

Load Data

Startup elasticsearch

Build project

Run the indexer

Using a local sqlite Senzing database

Using an external Senzing database

Search data

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Contributors 7

Uh oh!

Languages