RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation) : Integrates the retrieval (searching) into LLM text generation. RAG helps the model to “look up” external information to improve its responses. cite [25 Aug 2023]
In a 2020 paper, Meta (Facebook) came up with a framework called retrieval-augmented generation to give LLMs access to information beyond their training data. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: [cnt] [22 May 2020]
1. RAG-sequence — We retrieve k documents, and use them to generate all the output tokens that answer a user query.
2. RAG-token— We retrieve k documents, use them to generate the next token, then retrieve k more documents, use them to generate the next token, and so on. This means that we could end up retrieving several different sets of documents in the generation of a single answer to a user’s query.
3. Of the two approaches proposed in the paper, the RAG-sequence implementation is pretty much always used in the industry. It’s cheaper and simpler to run than the alternative, and it produces great results. cite [30 Sep 2023]

Research Papers

A Survey on Retrieval-Augmented Text Generation: [cnt]: This paper conducts a survey on retrieval-augmented text generation, highlighting its advantages and state-of-the-art performance in many NLP tasks. These tasks include Dialogue response generation, Machine translation, Summarization, Paraphrase generation, Text style transfer, and Data-to-text generation. [2 Feb 2022]
Hyde: Hypothetical Document Embeddings. zero-shot (generate a hypothetical document) -> embedding -> avg vectors -> retrieval [20 Dec 2022]
Active Retrieval Augmented Generation : [cnt]: Forward-Looking Active REtrieval augmented generation (FLARE): FLARE iteratively generates a temporary next sentence and check whether it contains low-probability tokens. If so, the system retrieves relevant documents and regenerates the sentence. Determine low-probability tokens by token_logprobs in OpenAI API response. git [11 May 2023]
Benchmarking Large Language Models in Retrieval-Augmented Generation: [cnt]: Retrieval-Augmented Generation Benchmark (RGB) is proposed to assess LLMs on 4 key abilities [4 Sep 2023]:
- Expand
  1. Noise robustness (External documents contain noises, struggled with noise above 80%)
  2. Negative rejection (External documents are all noises, Highest rejection rate was only 45%)
  3. Information integration (Difficulty in summarizing across multiple documents, Highest accuracy was 60-67%)
  4. Counterfactual robustness (Failed to detect factual errors in counterfactual external documents.)
Retrieval meets Long Context LLMs: [cnt]: We demonstrate that retrieval-augmentation significantly improves the performance of 4K context LLMs. Perhaps surprisingly, we find this simple retrieval-augmented baseline can perform comparable to 16K long context LLMs. [4 Oct 2023]
FreshLLMs: [cnt]: Fresh Prompt, Google search first, then use results in prompt. Our experiments show that FreshPrompt outperforms both competing search engine-augmented prompting methods such as Self-Ask (Press et al., 2022) as well as commercial systems such as Perplexity.AI. git [5 Oct 2023]
Self-RAG: [cnt] 1. Critic model C: Generates reflection tokens (IsREL (relevant,irrelevant), IsSUP (fullysupported,partially supported,nosupport), IsUse (is useful: 5,4,3,2,1)). It is pretrained on data labeled by GPT-4. 2. Generator model M: The main language model that generates task outputs and reflection tokens. It leverages the data labeled by the critic model during training. 3. Retriever model R: Retrieves relevant passages. The LM decides if external passages (retriever) are needed for text generation. git [17 Oct 2023]
RECOMP: Improving Retrieval-Augmented LMs with Compressors: [cnt]: 1. We propose RECOMP (Retrieve, Compress, Prepend), an intermediate step which compresses retrieved documents into a textual summary prior to prepending them to improve retrieval-augmented language models (RALMs). 2. We present two compressors – an extractive compressor which selects useful sentences from retrieved documents and an abstractive compressor which generates summaries by synthesizing information from multiple documents. 3. Both compressors are trained. [6 Oct 2023]
Retrieval-Augmentation for Long-form Question Answering: [cnt]: 1. The order of evidence documents affects the order of generated answers 2. the last sentence of the answer is more likely to be unsupported by evidence. 3. Automatic methods for detecting attribution can achieve reasonable performance, but still lag behind human agreement. Attribution in the paper assesses how well answers are based on provided evidence and avoid creating non-existent information. [18 Oct 2023]
RAG for LLMs: [cnt] 🏆Retrieval-Augmented Generation for Large Language Models: A Survey: Three paradigms of RAG Naive RAG > Advanced RAG > Modular RAG [18 Dec 2023]
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning: INTERS covers 21 search tasks across three categories: query understanding, document understanding, and query-document relationship understanding. The dataset is designed for instruction tuning, a method that fine-tunes LLMs on natural language instructions. git [12 Jan 2024]
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture. [16 Jan 2024]
The Power of Noise: Redefining Retrieval for RAG Systems: No more than 2-5 relevant docs + some amount of random noise to the LLM context maximizes the accuracy of the RAG. [26 Jan 2024]
Corrective Retrieval Augmented Generation (CRAG): Retrieval Evaluator assesses the retrieved documents and categorizes them as Correct, Ambiguous, or Incorrect. For Ambiguous and Incorrect documents, the method uses Web Search to improve the quality of the information. The refined and distilled documents are then used to generate the final output. [29 Jan 2024] CRAG implementation by LangGraph git
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity git [21 Mar 2024]
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval: Introduce a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. git pip install llama-index-packs-raptor / git [31 Jan 2024]
CRAG: Comprehensive RAG Benchmark: a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search ref [7 Jun 2024]
PlanRAG: Decision Making. Decision QA benchmark, DQA. Plan -> Retrieve -> Make a decision (PlanRAG) git [18 Jun 2024]
Searching for Best Practices in Retrieval-Augmented Generation: Best Performance Practice: Query Classification, Hybrid with HyDE (retrieval), monoT5 (reranking), Reverse (repacking), Recomp (summarization). Balanced Efficiency Practice: Query Classification, Hybrid (retrieval), TILDEv2 (reranking), Reverse (repacking), Recomp (summarization). [1 Jul 2024]
Retrieval Augmented Generation or Long-Context LLMs?: Long-Context consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. [23 Jul 2024]
Graph Retrieval-Augmented Generation: A Survey [15 Aug 2024]
OP-RAG: Order-preserve RAG: Unlike traditional RAG, which sorts retrieved chunks by relevance, we keep them in their original order from the text. [3 Sep 2024]
Retrieval Augmented Generation (RAG) and Beyond:🏆The paper classifies user queries into four levels—explicit, implicit, interpretable rationale, and hidden rationale—and highlights the need for external data integration and fine-tuning LLMs for specialized tasks. [23 Sep 2024]
Astute RAG: adaptively extracts essential information from LLMs, consolidates internal and external knowledge with source awareness, and finalizes answers based on reliability. [9 Oct 2024]

Advanced RAG

RAG Pipeline
1. Indexing Stage: Preparing a knowledge base.
2. Querying Stage: Querying the indexed data to retrieve relevant information.
3. Responding Stage: Generating responses based on the retrieved information. ref
Evaluation with Ragas: UMAP (often used to reduce the dimensionality of embeddings) with Ragas metrics for visualizing RAG results. [Mar 2024] / Ragas provides metrics: Context Precision, Context Relevancy, Context Recall, Faithfulness, Answer Relevance, Answer Semantic Similarity, Answer Correctness, Aspect Critique git [May 2023]
Advanced RAG Patterns: How to improve RAG peformance ref / ref [17 Oct 2023]
1. Data quality: Clean, standardize, deduplicate, segment, annotate, augment, and update data to make it clear, consistent, and context-rich.
2. Embeddings fine-tuning: Fine-tune embeddings to domain specifics, adjust them according to context, and refresh them periodically to capture evolving semantics.
3. Retrieval optimization: Refine chunking, embed metadata, use query routing, multi-vector retrieval, re-ranking, hybrid search, recursive retrieval, query engine, HyDE [20 Dec 2022], and vector search algorithms to improve retrieval efficiency and relevance.
4. Synthesis techniques: Query transformations, prompt templating, prompt conditioning, function calling, and fine-tuning the generator to refine the generation step.
- HyDE: Implemented in LangChain: HypotheticalDocumentEmbedder. A query generates hypothetical documents, which are then embedded and retrieved to provide the most relevant results. query -> generate n hypothetical documents -> documents embedding - (avg of embeddings) -> retrieve -> final result. ref
How to optimize RAG pipeline: Indexing optimization [24 Oct 2023]
Demystifying Advanced RAG Pipelines: An LLM-powered advanced RAG pipeline built from scratch git [19 Oct 2023]
cite [7 Nov 2023] OpenAI has put together a pretty good roadmap for building a production RAG system. Naive RAG -> Tune Chunks -> Rerank & Classify -> Prompt Engineering. In llama_index... 📺
9 Effective Techniques To Boost Retrieval Augmented Generation (RAG) Systems doc: ReRank, Prompt Compression, Hypothetical Document Embedding (HyDE), Query Rewrite and Expansion, Enhance Data Quality, Optimize Index Structure, Add Metadata, Align Query with Documents, Mixed Retrieval (Hybrid Search) [2 Jan 2024]
Contextual Retrieval: Contextual Retrieval enhances traditional RAG by using Contextual Embeddings and Contextual BM25 to maintain context during retrieval. [19 Sep 2024]

Agentic RAG

From Simple to Advanced RAG (LlamaIndex) ref / doc /💡ref [10 Oct 2023]
What is Agentic RAG: The article published by Weaviate. [5 Nov 2024]

Multi-modal RAG (Vision RAG)

Azure RAG with Vision Application Framework [Mar 2024]
localGPT-Vision: an end-to-end vision-based Retrieval-Augmented Generation (RAG) system. [Oct 2024]
Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG: Ultra High Resolution (UHR) remote sensing imagery, such as satellite imagery and medical imaging. [12 Nov 2024]
Visual RAG over PDFs with Vespa: a demo showcasing Visual RAG over PDFs using ColPali embeddings in Vespa git [19 Nov 2024]
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering: Using HistoCartography to improve pathology image analysis and boost PathVQA-Open performance. [26 Nov 2024]

GraphRAG

Graph RAG (by NebulaGraph): NebulaGraph proposes the concept of Graph RAG, which is a retrieval enhancement technique based on knowledge graphs. demo [8 Sep 2023]
GraphRAG (by Microsoft): 1. Global search: Original Documents -> Knowledge Graph (Community Summaries generated by LLM) -> Partial Responses -> Final Response. 2. Local Search: Utilizes vector-based search to find the nearest entities and relevant information. ref / git [24 Apr 2024]
- GraphRAG Implementation with LlamaIndex [15 Jul 2024]
- "From Local to Global" GraphRAG with Neo4j and LangChain [09 Jul 2024]
- LightRAG: Utilizing graph structures for text indexing and retrieval processes. [8 Oct 2024]
- nano-graphrag: A simple, easy-to-hack GraphRAG implementation [Jul 2024]
- DRIFT Search: DRIFT search (Dynamic Reasoning and Inference with Flexible Traversal) combines global and local search methods to improve query relevance by generating sub-questions and refining the context using HyDE (Hypothetical Document Embeddings). [31 Oct 2024]
- Improving global search via dynamic community selection: Dynamic Community Selection narrows the scope by selecting the most relevant communities based on query relevance, utilizing Map-reduce search, reducing costs by 77% without sacrificing output quality [15 Nov 2024]
- LazyGraphRAG: Reduces costs to 0.1% of full GraphRAG through efficient use of best-first (vector-based) and breadth-first (global search) retrieval and deferred LLM calls [25 Nov 2024]

The Problem with RAG

The Problem with RAG
1. A question is not semantically similar to its answers. Cosine similarity may favor semantically similar texts that do not contain the answer.
2. Semantic similarity gets diluted if the document is too long. Cosine similarity may favor short documents with only the relevant information.
3. The information needs to be contained in one or a few documents. Information that requires aggregations by scanning the whole data.
Seven Failure Points When Engineering a Retrieval Augmented Generation System: 1. Missing Content, 2. Missed the Top Ranked Documents, 3. Not in Context, 4. Not Extracted, 5. Wrong Format, 6. Incorrect Specificity, 7. Lack of Thorough Testing [11 Jan 2024]
Solving the core challenges of Retrieval-Augmented Generation ref [Feb 2024]

RAG Solution Design & Application

RAG Solution Design

Papers with code: RAG
Azure: Designing and developing a RAG solution
- Announcing cost-effective RAG at scale with Azure AI Search
- Advanced RAG with Azure AI Search and LlamaIndex
- GPT-RAG: Enterprise RAG Solution Accelerator [Jun 2023]
- Azure OpenAI chat baseline architecture in an Azure landing zone
- Azure Reference Architectures: x-ref

RAG at scale: Building a distributed system for synchronizing and ingesting billions of text embeddings [28 Sep 2023]
A Practical Approach to Retrieval Augmented Generation (RAG) Systems: Online book [Dec 2023]
LangChain RAG from scratch [Jan 2024]
LlamIndex Building Performant RAG Applications for Production
Advanced RAG on Hugging Face documentation using LangChain
LLM Twin Course: Building Your Production-Ready AI Replica: Learn to Build a Production-Ready LLM & RAG System with LLMOps [Mar 2024]
RAG-driven Generative AI: Retrieval Augmented Generation (RAG) code for Generative AI with LlamaIndex, Deep Lake, and Pinecone [Apr 2024]
Learn RAG with LangChain: Online book [May 2024]
RAG context relevancy metric: Ragas, TruLens, DeepEval ref [Jun 2024]
- Context Relevancy (in Ragas) = S / Total number of sentences in retrieved context
- Contextual Relevancy (in DeepEval) = Number of Relevant Statements / Total Number of Statements
What AI Engineers Should Know about Search [25 Jun 2024]
Advanced RAG Techniques:🏆Showcases various advanced techniques for Retrieval-Augmented Generation (RAG) [Jul 2024]
Galileo eBook: 200 pages content. Mastering RAG. doc [Sep 2024]
Introduction to Large-Scale Similarity Search: HNSW, IVF, LSH [28 Sep 2024]
5 Chunking Strategies For RAG [19 Oct 2024]
Genie: Uber’s Gen AI On-Call Copilot [10 Oct 2024]

RAG Development

Haystack: LLM orchestration framework to build customizable, production-ready LLM applications. [5 May 2020]
Cognita: RAG (Retrieval Augmented Generation) Framework for building modular, open-source applications [Jul 2023]
Canopy: open-source RAG framework and context engine built on top of the Pinecone vector database. [Aug 2023]
RAGflow: Streamlined RAG workflow. Focusing on Deep document understanding [Dec 2023]
AutoRAG: RAG AutoML tool for automatically finds an optimal RAG pipeline for your data. [Jan 2024]
RAGApp: Agentic RAG. Custom GPTs, but deployable in your own cloud infrastructure using Docker. [Apr 2024]
RAG Builder: Automatically create an optimal production-ready Retrieval-Augmented Generation (RAG) setup for your data. [Jun 2024]
MindSearch: An open-source AI Search Engine Framework [Jul 2024]
RAGFoundry: A library designed to improve LLMs ability to use external information by fine-tuning models on specially created RAG-augmented datasets. [5 Aug 2024]
RAGChecker: A Fine-grained Framework For Diagnosing RAG git [15 Aug 2024]

RAG Application

SWIRL AI Connect: SWIRL AI Connect enables you to perform Unified Search and bring in a secure AI Co-Pilot. [Apr 2022]
PaperQA2: High accuracy RAG for answering questions from scientific documents with citations [Feb 2023]
Danswer: Ask Questions in natural language and get Answers backed by private sources: Slack, GitHub, Confluence, etc. [Apr 2023]
PrivateGPT: 100% privately, no data leaks. The API is built using FastAPI and follows OpenAI's API scheme. [May 2023]
quivr: A personal productivity assistant (RAG). Chat with your docs (PDF, CSV, ...) [May 2023]
Verba: Retrieval Augmented Generation (RAG) chatbot powered by Weaviate [Jul 2023]
RAG capabilities of LlamaIndex to QA about SEC 10-K & 10-Q documents: A real world full-stack application using LlamaIndex [Sep 2023]
RAGxplorer: Visualizing document chunks and the queries in the embedding space. [Jan 2024]
Open Source AI Searches: Perplexica:💡Open source alternative to Perplexity AI [Apr 2024] / Marqo [Aug 2022] / txtai [Aug 2020] / Typesense [Jan 2017] / Morphic [Apr 2024]
llm-answer-engine: Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, LangChain, OpenAI, Brave & Serper [Mar 2024]
turboseek: An AI search engine inspired by Perplexity [May 2024]
R2R: R2R (RAG to Riches), the Elasticsearch for RAG. [Feb 2024]
FlashRAG: A Python Toolkit for Efficient RAG Research [Mar 2024]
kotaemon: Open-source clean & customizable RAG UI for chatting with your documents. [Mar 2024]
MedGraphRAG: MedGraphRAG outperforms the previous SOTA model, Medprompt, by 1.1%. git [8 Aug 2024]
HybridRAG: Integrating VectorRAG and GraphRAG with financial earnings call transcripts in Q&A format. [9 Aug 2024]
MemFree: Hybrid AI Search Engine + AI Page Generator. [Jun 2024]
RAGLite:　a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite [Jun 2024]
Applications, Frameworks, and User Interface (UI/UX): x-ref

LlamaIndex

LlamaIndex (formerly GPT Index) is a data framework for LLM applications to ingest, structure, and access private or domain-specific data. The high-level API allows users to ingest and query their data in a few lines of code. High-Level Concept: ref / doc:ref / blog:ref / git [Nov 2022]
Fun fact this core idea was the initial inspiration for GPT Index (the former name of LlamaIndex) 11/8/2022 - almost a year ago!. cite / Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
1. Build a data structure (memory tree)
2. Transverse it via LLM prompting
LlamaIndex Toolkits:
- LlamaHub: A library of data loaders for LLMs git [Feb 2023]
- LlamaIndex CLI: a command line tool to generate LlamaIndex apps ref [Nov 2023]
- LlamaParse: A unique parsing tool for intricate documents git [Feb 2024]

LlamaIndex integration with Azure AI

LlamaIndex integration with Azure AI: [19 Nov 2024]
Core: Azure OpenAI Service, Azure AI Search
Storage and memory: Azure Table Storage as a Docstore or Azure Cosmos DB.
Workflow example: Azure Code Interpreter
AI App Template Gallery

High-Level Concepts

Query engine vs Chat engine
1. The query engine wraps a retriever and a response synthesizer into a pipeline, that will use the query string to fetch nodes (sentences or paragraphs) from the index and then send them to the LLM (Language and Logic Model) to generate a response
2. The chat engine is a quick and simple way to chat with the data in your index. It uses a context manager to keep track of the conversation history and generate relevant queries for the retriever. Conceptually, it is a stateful analogy of a Query Engine.

Storage Context vs Settings (p.k.a. Service Context)

Both the Storage Context and Service Context are data classes.
1. Introduced in v0.10.0, ServiceContext is replaced to Settings object.
2. Storage Context is responsible for the storage and retrieval of data in Llama Index, while the Service Context helps in incorporating external context to enhance the search experience.
3. The Service Context is not directly involved in the storage or retrieval of data, but it helps in providing a more context-aware and accurate search experience.

# The storage context container is a utility container for storing nodes, indices, and vectors.
class StorageContext:
  docstore: BaseDocumentStore
  index_store: BaseIndexStore
  vector_store: VectorStore
  graph_store: GraphStore

# NOTE: Deprecated, use llama_index.settings.Settings. The service context container is a utility container for LlamaIndex index and query classes.
class ServiceContext:
  llm_predictor: BaseLLMPredictor
  prompt_helper: PromptHelper
  embed_model: BaseEmbedding
  node_parser: NodeParser
  llama_logger: LlamaLogger
  callback_manager: CallbackManager

@dataclass
class _Settings:
  # lazy initialization
  _llm: Optional[LLM] = None
  _embed_model: Optional[BaseEmbedding] = None
  _callback_manager: Optional[CallbackManager] = None
  _tokenizer: Optional[Callable[[str], List[Any]]] = None
  _node_parser: Optional[NodeParser] = None
  _prompt_helper: Optional[PromptHelper] = None
  _transformations: Optional[List[TransformComponent]] = None

LlamaIndex Tutorial

LlamaIndex Overview (Japanese) [17 Jul 2023]
Fine-Tuning a Linear Adapter for Any Embedding Model: Fine-tuning the embeddings model requires you to reindex your documents. With this approach, you do not need to re-embed your documents. Simply transform the query instead. [7 Sep 2023]
4 RAG techniques implemented in llama_index / cite [20 Sep 2023] / git
Expand: 4 RAG techniques
1. SQL Router Query Engine: Query router that can reference your vector database or SQL database
2. Sub Question Query Engine: Break down the complex question into sub-questions
3. Recursive Retriever + Query Engine: Reference node relationships, rather than only finding a node (chunk) that is most relevant.
4. Self Correcting Query Engines: Use an LLM to evaluate its own output.
LlamaIndex Tutorial: A Complete LlamaIndex Guide [18 Oct 2023]

Chat engine ReAct mode, FLARE Query engine
Building and Productionizing RAG: doc: Optimizing RAG Systems 1. Table Stakes 2. Advanced Retrieval: Small-to-Big 3. Agents 4. Fine-Tuning 5. Evaluation [Nov 2023]
Multimodal RAG Pipeline ref [Nov 2023]
A Cheat Sheet and Some Recipes For Building Advanced RAG RAG cheat sheet shared above was inspired by RAG survey paper. doc [Jan 2024]

Vector Database Comparison

Faiss: Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It is used as an alternative to a vector database in the development and library of algorithms for a vector database. It is developed by Facebook AI Research. git [Feb 2017]
Milvus (A cloud-native vector database) Embedded git [Sep 2019]: Alternative option to replace PineCone and Redis Search in OSS. It offers support for multiple languages, addresses the limitations of RedisSearch, and provides cloud scalability and high reliability with Kubernetes.
Qdrant: Written in Rust. Qdrant (read: quadrant) [May 2020]
Pinecone: A fully managed cloud Vector Database. Commercial Product [Jan 2021]
Weaviate: Store both vectors and data objects. [Jan 2021]
pgvector: Open-source vector similarity search for Postgres [Apr 2021] / pgvectorscale: 75% cheaper than pinecone [Jul 2023]
Not All Vector Databases Are Made Equal: Printed version for "Medium" limits. doc [2 Oct 2021]
Chroma: Open-source embedding database [Oct 2022]
Redis extension for vector search, RedisVL: Redis Vector Library (RedisVL) [Nov 2022]
A SQLite extension for efficient vector search, based on Faiss! [Jan 2023]
lancedb: LanceDB's core is written in Rust and is built using Lance, an open-source columnar format. [Feb 2023]
A Comprehensive Survey on Vector Database: Categorizes search algorithms by their approach, such as hash-based, tree-based, graph-based, and quantization-based. [18 Oct 2023]

Vector Database Options for Azure

Vector Search in Azure Cosmos DB for MongoDB vCore [23 May 2023]
Pgvector extension on Azure Cosmos DB for PostgreSQL: ref [13 Jun 2023]
Vector search - Azure AI Search: ref Rebranded from Azure Cognitive Search [Oct 2019] to Azure AI Search [Nov 2023]
Azure Cache for Redis Enterprise: Enterprise Redis Vector Search Demo [22 May 2023 ]
Azure SQL's support for natively storing and querying vectors [21 May 2024]
GraphRAG, available in preview in Azure Database for PostgreSQL [19 Nov 2024]
DiskANN, a state-of-the-art suite of algorithms for low-latency, highly scalable vector search, is now generally available in Azure Cosmos DB and in preview for Azure Database for PostgreSQL. [19 Nov 2024]

Note: Azure Cache for Redis Enterprise: Enterprise Sku series are not able to deploy by a template such as Bicep and ARM.

Embedding

Azure Open AI Embedding API, text-embedding-ada-002, supports 1536 dimensions. Elastic search, Lucene based engine, supports 1024 dimensions as a max. Open search can insert 16,000 dimensions as a vector storage. Open search is available to use as a vector database with Azure Open AI Embedding API.
OpenAI Embedding models: text-embedding-3 x-ref > New embedding models
text-embedding-ada-002: Smaller embedding size. The new embeddings have only 1536 dimensions, one-eighth the size of davinci-001 embeddings, making the new embeddings more cost effective in working with vector databases. [15 Dec 2022]
However, one exception to this is that the maximum dimension count for the Lucene engine is 1,024, compared with 16,000 for the other engines. ref

Vector Search with OpenAI Embeddings: Lucene Is All You Need: Our experiments were based on Lucene 9.5.0, but indexing was a bit tricky because the HNSW implementation in Lucene restricts vectors to 1024 dimensions, which was not sufficient for OpenAI’s 1536-dimensional embeddings. Although the resolution of this issue, which is to make vector dimensions configurable on a per codec basis, has been merged to the Lucene source trunk git, this feature has not been folded into a Lucene release (yet) as of early August 2023. [29 Aug 2023]
Is Cosine-Similarity of Embeddings Really About Similarity?: In linear matrix factorization, the use of regularization can impact, and in some cases, render cosine similarities meaningless. Regularization involves two objectives. The first objective applies L2-norm regularization to the product of matrices A and B, a process similar to dropout. The second objective applies L2-norm regularization to each individual matrix, similar to the weight decay technique used in deep learning. [8 Mar 2024]
Contextual Document Embedding (CDE): Improve document retrieval by embedding both queries and documents within the context of the broader document corpus. ref [3 Oct 2024]
Fine-tuning Embeddings for Specific Domains [1 Oct 2024]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rag.md

rag.md

RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation)

Research Papers

Advanced RAG

Agentic RAG

Multi-modal RAG (Vision RAG)

GraphRAG

The Problem with RAG

RAG Solution Design & Application

RAG Solution Design

RAG Development

RAG Application

LlamaIndex

LlamaIndex integration with Azure AI

High-Level Concepts

LlamaIndex Tutorial

Vector Database Comparison

Vector Database Options for Azure

Embedding

Files

rag.md

Latest commit

History

rag.md

File metadata and controls

RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation)

Research Papers

Advanced RAG

Agentic RAG

Multi-modal RAG (Vision RAG)

GraphRAG

The Problem with RAG

RAG Solution Design & Application

RAG Solution Design

RAG Development

RAG Application

LlamaIndex

LlamaIndex integration with Azure AI

High-Level Concepts

LlamaIndex Tutorial

Vector Database Comparison

Vector Database Options for Azure

Embedding