Skip to content

Latest commit

 

History

History

transformers

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
#Vespa

Vespa sample application - Transformers

This sample application is a small example of using Transformer-based cross-encoders for ranking using a small sample from the MS MARCO data set.

See also the more comprehensive MS Marco Ranking sample app which uses multiple Transformer based models for retrieval and ranking.

This application uses phased ranking, first a set of candidate documents are retrieved using WAND.

The hits retrieved by the WAND operator are ranked using BM25. The top-k ranking documents from the first phase are re-ranked using a cross-encoder Transformer model. The cross-encoder re-ranking uses global phase, evaluated in the Vespa stateless container.

Requirements:

  • Docker Desktop installed and running. 4GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
  • Alternatively, deploy using Vespa Cloud
  • Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
  • Architecture: x86_64 or arm64
  • Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
  • python3.8+ to export models from Huggingface.

Validate environment, should be minimum 6G:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment using container image:

$ vespa config set target local

Pull and start the vespa docker container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa

Download this sample application:

$ vespa clone transformers myapp && cd myapp

Install required python packages:

$ python3 -m pip install --upgrade pip
$ python3 -m pip install torch transformers onnx onnxruntime

For this sample application, we use a fine-tuned MiniLM model with 6 layers and 22 million parameters. This step downloads the cross-encoder transformer model, converts it to an ONNX model, and saves it in the files directory:

$ ./bin/setup-ranking-model.sh

Verify that configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

Deploy the app:

$ vespa deploy --wait 300 application

Deployment note

It is possible to deploy this app to Vespa Cloud.

Wait for the application endpoint to become available:

$ vespa status --wait 300

Convert from MS MARCO format to Vespa JSON feed format. To use the entire MS MARCO data set, use the download script. This step creates a vespa.json file in the msmarco directory:

$ ./bin/convert-msmarco.sh

Index data:

$ vespa feed msmarco/vespa.json

Query data. Note that the embed part is required to convert the query text to wordpiece representation which is used by the rank-profile:

$ vespa query \
 'yql=select title from msmarco where userQuery()' \
 'query=is long term care insurance tax deductible' \
 'ranking=transformer' \
 'input.query(q)=embed(is long term care insurance tax deductible)'

This script reads from the MS MARCO queries and issues a Vespa query:

$ ./bin/evaluate.py

Shutdown and remove the container:

$ docker rm -f vespa

Bonus

To export other cross-encoder models, change the code in "src/python/setup-model.py". However, this sample application uses a Vespa WordPiece embedder, so if the Transformer model requires a different tokenizer, you would have to change the tokenizer. For example using Vespa SentencePiece embedder.