This sample application is used to demonstrate how to adapt frozen embeddings from foundational embedding models. Frozen data embeddings from Foundational models are an emerging industry practice for reducing the complexity of maintaining and versioning embeddings. The frozen data embeddings are re-used for various tasks, such as classification, search, or recommendations.
Read the blog post.
The following is a quick start recipe on how to get started with this application:
- Docker Desktop installed and running. 4 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
- Alternatively, deploy using Vespa Cloud
- Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
- Architecture: x86_64 or arm64
- Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
Validate Docker resource settings, should be minimum 4 GB:
$ docker info | grep "Total Memory" or $ podman info | grep "memTotal"
Install Vespa CLI:
$ brew install vespa-cli
For local deployment using docker image:
$ vespa config set target local
Pull and start the vespa docker container image:
$ docker pull vespaengine/vespa $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \ vespaengine/vespa
Verify that configuration service (deploy api) is ready:
$ vespa status deploy --wait 300
Download this sample application:
$ vespa clone custom-embeddings my-app && cd my-app
Download a frozen embedding model file, see text embeddings made easy for details:
$ mkdir -p models $ curl -L -o models/tokenizer.json \ https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json $ curl -L -o models/frozen.onnx \ https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx $ cp models/frozen.onnx models/tuned.onnx
In this case, we re-use the frozen model as the tuned model to demonstrate functionality.
Deploy the application :
$ vespa deploy --wait 300
It is possible to deploy this app to Vespa Cloud.
vespa document ext/1.json vespa document ext/2.json vespa document ext/3.json
We demonstrate using vespa cli
, use -v
to see the curl equivalent using HTTP api.
vespa query 'yql=select * from doc where true' \ 'ranking=unranked'
Notice the relevance
, which is assigned by the rank-profile.
vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \ 'input.query(q)=embed(frozen, "space contains many suns")'
vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \ 'input.query(q)=embed(tuned, "space contains many suns")'
In this case, the tuned model is equivelent to the frozen query tower that was used for document embeddings.
vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \ 'input.query(q)=embed(tuned, "space contains many suns")' \ 'ranking=simple-similarity'
This invokes the simple-similarity
ranking model, which performs the query transformation
to the tuned embedding.
vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \ 'input.query(q)=embed(tuned, "space contains many suns")' \ 'ranking=custom-similarity'
Note that this just demonstrates the functionality, the custom similarity model is initialized from random weights.
This is useful for training routines, getting the frozen document embeddings out of Vespa:
vespa visit --field-set "[all]" > ../vector-data.jsonl
curl "http://localhost:8080/document/v1/doc/doc/docid/1?fieldSet=\[all\]"
Tear down the running container:
$ docker rm -f vespa