This project involves creating a search application using Vespa, a platform for scalable and fast data serving. The main objective is to process movie data, deploy a Vespa instance in Docker, and execute various types of searches. The tasks include data processing, application deployment, and query execution.
- Python 3.x
- Docker Desktop (Ensure it is installed and running)
- vespacli or pyvespa Python module
- Run the provided script to process
tmdb_5000_movies.csv
into a Vespa-compatible JSON format.from process_script import process_tmdb_csv process_tmdb_csv("tmdb_5000_movies.csv", "clean_tmdb.jsonl")
- Verify the output: Ensure that
clean_tmdb.jsonl
contains the required fields (doc_id
,title
, andtext
).
- Pull and Run Vespa Container:
docker pull vespaengine/vespa docker run --detach --name vespa-hybrid --hostname vespa-container --publish 19071:19071 --publish 8082:8080 vespaengine/vespa
- Verify the Container:
- Run
docker ps
to confirm the container is running. - Access
http://localhost:19071
to check the deployment API.
- Run
- Install
vespacli
:pip install --ignore-installed vespacli
- Deploy the Application:
vespa config set target local vespa deploy --wait 300 app
- Feed Data into Vespa:
vespa feed -t http://localhost:8082 clean_tmdb.jsonl
- Connect to Vespa Using Python:
from vespa.application import Vespa app = Vespa(url="http://localhost", port=8082)
- Run Keyword Search:
df = keyword_search(app, "Harry Potter and the Half-Blood Prince") print(df)
- Run Semantic Search:
df = semantic_search(app, "Harry Potter and the Half-Blood Prince") print(df)