Skip to content

This project demonstrates building a scalable search application using Vespa, involving data processing, deployment, and query execution.

Notifications You must be signed in to change notification settings

NiranjanRao07/Vespa-AI

Repository files navigation

Vespa Search Application

Overview

This project involves creating a search application using Vespa, a platform for scalable and fast data serving. The main objective is to process movie data, deploy a Vespa instance in Docker, and execute various types of searches. The tasks include data processing, application deployment, and query execution.

Table of Contents

  1. Prerequisites
  2. Steps to Complete the Assignment

Prerequisites

  • Python 3.x
  • Docker Desktop (Ensure it is installed and running)
  • vespacli or pyvespa Python module

Steps to Complete the Assignment

1. Data Processing

  1. Run the provided script to process tmdb_5000_movies.csv into a Vespa-compatible JSON format.
    from process_script import process_tmdb_csv
    process_tmdb_csv("tmdb_5000_movies.csv", "clean_tmdb.jsonl")
  2. Verify the output: Ensure that clean_tmdb.jsonl contains the required fields (doc_id, title, and text).

2. Run Vespa as a Docker Container

  1. Pull and Run Vespa Container:
    docker pull vespaengine/vespa
    docker run --detach --name vespa-hybrid --hostname vespa-container --publish 19071:19071 --publish 8082:8080 vespaengine/vespa
  2. Verify the Container:
    • Run docker ps to confirm the container is running.
    • Access http://localhost:19071 to check the deployment API.

3. Configure Vespa and Ingest Data

  1. Install vespacli:
    pip install --ignore-installed vespacli
  2. Deploy the Application:
    vespa config set target local
    vespa deploy --wait 300 app
  3. Feed Data into Vespa:
    vespa feed -t http://localhost:8082 clean_tmdb.jsonl

4. Run Search Queries

  1. Connect to Vespa Using Python:
    from vespa.application import Vespa
    
    app = Vespa(url="http://localhost", port=8082)
  2. Run Keyword Search:
    df = keyword_search(app, "Harry Potter and the Half-Blood Prince")
    print(df)
  3. Run Semantic Search:
    df = semantic_search(app, "Harry Potter and the Half-Blood Prince")
    print(df)

About

This project demonstrates building a scalable search application using Vespa, involving data processing, deployment, and query execution.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages