Skip to content

Vector db apps - Semantic search, RAG, Image similarity search, Anamoly detection, Recommendation system

Notifications You must be signed in to change notification settings

mekhiya/vector-database-ai-apps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Vector Database AI Apps

AI Apps using Vector Database (Pinecone)

Vector databses is eseential part of stack for developing LLM base applications. RAG - (retrieval augmented generation), retrieves the relevant data and use it as augmented context for the LLM application.

VECTOR DBs can also do:

  • Text similarity search
  • RAGs
  • Image similarity search
  • anamoly detection
  • recommendation system

Vector dbs good for sparse & dense vectors

Repo consists of below 6 apps using Vector DBs in various ways:

    1. Basic semantic search for text documents
    1. RAG
    1. Recommendation system
    1. Hybrid Search app for product Recommendation (uses dense vector for image & sparse for text)
    1. Child Parent similarity app
    1. Anamoly dtection based on database of server logs

1) SEMANTIC SEARCH

link to git code

search using meaning of content being search, whereas lexical search which looks for literal or pattern matching strings.

SEMANTIC SEARCH

We will use Sentence Trasnformer model file for embedding.

  • SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings.(initial work - paper Sentence-BERT)
  • framework to compute sentence / text embeddings for more than 100 languages.
  • embeddings can be compared e.g. with cosine-similarity to find sentences with a similar meaning.
  • useful for semantic textual similarity, semantic search, or paraphrase mining.
  • framework based on PyTorch and Transformers
  • offers a large collection of pre-trained models tuned for various tasks.
  • easy to fine-tune your own models
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

we will use sentence transformer model all-MiniLM-L6-v2 for embeddings. It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

2) RAG (Retrieval Augmented Generation)

link to git code

Insteaing of directly sending query to LLMs, in RAGs we optimize the output by also refering authoritative knowledge base (which was not part of training data)

DATASET - wikipidea articles Add embeddings to vector db on search Query - Search result on vector database pinecone document retrieval OpenAI - augmented query sent to OpenAI

rag

Image Source

Embedding model - 'text-embedding-ada-002' (OPENAI)

  • text-embedding-ada-002 used for text search, text similarity, and code search
  • outperforms previous model - Davinci

embedding-models

OpenAI Embedding model can be simply called by below line. It converts

import openai
response = openai.Embedding.create(
  input="I have a dream",
  model="text-embedding-ada-002"
)

PINECONE Index works with format of values:

(ids, values, metadata)

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index("pinecone-index")

index.upsert(
  vectors=[
    {
      "id": "A", 
      "values": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 
      "metadata": {"genre": "comedy", "year": 2020}
    },
    {
      "id": "B", 
      "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
      "metadata": {"genre": "documentary", "year": 2019}
    }
  ]
)

(Update + insert = upsert)

3) RECOMMENDER SYSTEM

New article Embeddings from article titles Recommended system which searches across all titles

Reco system based on content rather than topic

About

Vector db apps - Semantic search, RAG, Image similarity search, Anamoly detection, Recommendation system

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages