A PostgreSQL extension that provides tokenizers for full-text search.
The official tensorchord/vchord-suite
Docker image comes pre-configured with several complementary extensions, you can find more details in the VectorChord-images repository:
pg_tokenizer
- This extensionVectorChord-bm25
- Native BM25 Ranking IndexVectorChord
- Scalable, high-performance, and disk-efficient vector similarity search
Simply run the Docker container as shown below:
docker run \
--name vchord-suite \
-e POSTGRES_PASSWORD=postgres \
-p 5432:5432 \
-d tensorchord/vchord-suite:pg17-latest
# If you want to use ghcr image, you can change the image to `ghcr.io/tensorchord/vchord-suite:pg17-latest`.
# if you want to use the specific version, you can use the tag `pg17-20250414`, supported version can be found in the support matrix.
Once everything’s set up, you can connect to the database using the psql
command line tool. The default username is postgres
, and the default password is postgres
. Here’s how to connect:
psql -h localhost -p 5432 -U postgres
After connecting, run the following SQL to make sure the extension is enabled:
CREATE EXTENSION pg_tokenizer;
SELECT create_tokenizer('tokenizer1', $$
model = "llmlingua2"
$$);
SELECT tokenize('PostgreSQL is a powerful, open-source object-relational database system. It has over 15 years of active development.', 'tokenizer1');
More examples can be found in docs/03-examples.md.