pg_tokenizer

A PostgreSQL extension that provides tokenizers for full-text search.

Quick Start

The official tensorchord/vchord-suite Docker image comes pre-configured with several complementary extensions, you can find more details in the VectorChord-images repository:

pg_tokenizer - This extension
VectorChord-bm25 - Native BM25 Ranking Index
VectorChord - Scalable, high-performance, and disk-efficient vector similarity search

Simply run the Docker container as shown below:

docker run   \           
  --name vchord-suite  \
  -e POSTGRES_PASSWORD=postgres  \
  -p 5432:5432 \
  -d tensorchord/vchord-suite:pg17-latest
  # If you want to use ghcr image, you can change the image to `ghcr.io/tensorchord/vchord-suite:pg17-latest`.
  # if you want to use the specific version, you can use the tag `pg17-20250414`, supported version can be found in the support matrix.

Once everything’s set up, you can connect to the database using the psql command line tool. The default username is postgres, and the default password is postgres. Here’s how to connect:

psql -h localhost -p 5432 -U postgres

After connecting, run the following SQL to make sure the extension is enabled:

CREATE EXTENSION pg_tokenizer;

Example

SELECT create_tokenizer('tokenizer1', $$
model = "llmlingua2"
$$);

SELECT tokenize('PostgreSQL is a powerful, open-source object-relational database system. It has over 15 years of active development.', 'tokenizer1');

More examples can be found in docs/03-examples.md.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.cargo		.cargo
.github/workflows		.github/workflows
assets		assets
docker		docker
docs		docs
sql/install		sql/install
src		src
tests		tests
tools		tools
.gitignore		.gitignore
.taplo.toml		.taplo.toml
.typos.toml		.typos.toml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pg_tokenizer.control		pg_tokenizer.control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_tokenizer

Quick Start

Example

Documentation

About

Releases 1

Packages

Contributors 2

Languages

License

tensorchord/pg_tokenizer.rs

Folders and files

Latest commit

History

Repository files navigation

pg_tokenizer

Quick Start

Example

Documentation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages