Provence #40

nadiinchi · 2025-02-20T12:45:04Z

add Provence context_processor and configs
add baseline context pruners context_processors and configs
add context compression calculation & saving in a separate metric file in an exp folder
little refactor of modules/rag.py to better accommodate context processing
add possibility to provide oracle docs in the query dataset
add RGB dataset (with oracle docs in query dataset)

DRRV · 2025-02-25T08:00:40Z

just be to usre: here Provence does not do reranking right?

models/context_processors/recomp.py

nadiinchi · 2025-02-25T10:04:12Z

just be to usre: here Provence does not do reranking right?

it does. If reorder=True in provence config, provence reorders documents in the context processing list. But it is incorporated into context processing, so you would set retriever=<smth> context_processor=provence/provence_reranking_0.1 and skip reranker

models/context_processors/dslr_ce.py

maxime-louis · 2025-02-27T07:47:41Z

models/context_processors/dslr_ce.py

+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
+        self.model = AutoModelForSequenceClassification.from_pretrained(model_name, low_cpu_mem_usage=True, torch_dtype=torch.float16).to(self.device)
+


Two consecutive line returns: only one is allowed within functions

models/context_processors/dslr_ce.py

maxime-louis · 2025-02-27T07:51:36Z

models/context_processors/dslr_ce.py

+                                ).logits
+                # Use sigmoid since it's BCEWithLogitsLoss
+                prob = torch.sigmoid(rank_score)
+                probs+=(prob)


I would use 'prob.item()' to make it a float and not a torch tensor

models/context_processors/llmlingua2.py

models/context_processors/provence.py

models/context_processors/recomp.py

maxime-louis · 2025-02-27T07:59:26Z

just be to usre: here Provence does not do reranking right?

it does. If reorder=True in provence config, provence reorders documents in the context processing list. But it is incorporated into context processing, so you would set retriever=<smth> context_processor=provence/provence_reranking_0.1 and skip reranker

Can you put an assert for that ?

sclincha · 2025-02-28T15:00:47Z

Do we want to update the requirements file for LLMLingua? Or Simply Try Catch, please install?
It could be good to have a simple README for context processor explaining the list of models and basic usage such as:
CONFIG=myconfig python bergen.py retriever=splade-v3 reranker=debertav3 +context_processor=provence/provence_standalone_0.1 dataset=$1 generator=vllm_llama-2-7b-chat
What do you think? Where should we put that README?. The run_exp file is a bit hard to read for a first usage ...

Nadia Chirkova added 3 commits February 12, 2025 16:56

provence changes

eb8eeff

Merge branch 'provence'

ead44a4

fix generator

97ab180

nadiinchi requested a review from sclincha February 20, 2025 12:47

sclincha requested review from DRRV and maxime-louis February 20, 2025 12:57

fix our favourite bracket in recomp (again)

acdef31

sclincha reviewed Feb 25, 2025

View reviewed changes

models/context_processors/recomp.py Outdated Show resolved Hide resolved

Nadia Chirkova added 2 commits February 25, 2025 11:28

fix imports and add run_exps.sh script

3efe3fd

small addition to run_exp script

a53efe8