-
Notifications
You must be signed in to change notification settings - Fork 416
Support for retriever-augmented models. #1125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support for retriever-augmented models. #1125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements support for evaluating Retrieval-Augmented Generation (RAG) systems within LightEval, addressing issue #1109. It introduces a flexible adapter pattern that allows users to plug in any retriever and generator combination to evaluate RAG systems on existing LightEval benchmarks.
Changes:
- Added
RAGAdapterModelbase class implementing theLightevalModelinterface with protocols for retriever and generator components - Extended
ModelResponsedataclass with an optionalmetadatafield for storing retrieval information - Provided a working example implementation using sentence transformers for retrieval and T5 for generation with a TriviaQA-focused document corpus
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.
| File | Description |
|---|---|
| src/lighteval/models/rag/rag_model.py | Core RAG adapter implementation with RetrieverProtocol, GeneratorProtocol, ContextFormatter utility class, and RAGAdapterModel base class |
| src/lighteval/models/model_output.py | Added optional metadata field to ModelResponse for storing retrieval information and other model-specific data |
| examples/custom_models/rag_model_example.py | Complete working example with SimpleVectorRetriever and SimpleGenerator demonstrating RAG evaluation on TriviaQA-style tasks |
| src/lighteval/models/custom/rag_adapters.py | Placeholder file marked "TO BE IMPLEMENTED" |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…val into akshath/issue-1109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
cc: @NathanHB |
ISSUE SUMMARY
Resolves #1109
This PR implements evaluation of RAG systems in LightEval. The PR provides a flexible adapter pattern that allows uers to plug in any retriever and generator combination, enabling evaluation of RAG systems on LightEval benchmarks.
The implementation provides:
RAGAdapterModel: A base class that implements theLightevalModelinterface.The RAG adapter works by:
Docobjects.ModelResponseobjects.This allows RAG systems to be evaluated on benchmarks like TrivialQA, MMLU, etc. using the same metrics (exact_match, F1, ROUGE) as traditional language models.
You can take a look at the example provided in
examples/custom_models/rag_model_example.py.Quick Start (for example)
lighteval custom \ "rag-flan" \ "examples/custom_models/rag_model_example.py" \ "triviaqa" \ --max-samples 10 \ --save-detailsTo implement your own RAG Model
Step 1. Implement Retriever
Step 2. Implement Generator
Step 3. Create RAG Model
Step 4. Evaluate
(OR) using the Python API
Limitations