Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provence #40

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Provence #40

wants to merge 7 commits into from

Conversation

nadiinchi
Copy link
Collaborator

@nadiinchi nadiinchi commented Feb 20, 2025

  • add Provence context_processor and configs
  • add baseline context pruners context_processors and configs
  • add context compression calculation & saving in a separate metric file in an exp folder
  • little refactor of modules/rag.py to better accommodate context processing
  • add possibility to provide oracle docs in the query dataset
  • add RGB dataset (with oracle docs in query dataset)

@nadiinchi nadiinchi requested a review from sclincha February 20, 2025 12:47
@DRRV
Copy link
Contributor

DRRV commented Feb 25, 2025

just be to usre: here Provence does not do reranking right?

@nadiinchi
Copy link
Collaborator Author

just be to usre: here Provence does not do reranking right?

it does. If reorder=True in provence config, provence reorders documents in the context processing list. But it is incorporated into context processing, so you would set retriever=<smth> context_processor=provence/provence_reranking_0.1 and skip reranker

self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSequenceClassification.from_pretrained(model_name, low_cpu_mem_usage=True, torch_dtype=torch.float16).to(self.device)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two consecutive line returns: only one is allowed within functions

).logits
# Use sigmoid since it's BCEWithLogitsLoss
prob = torch.sigmoid(rank_score)
probs+=(prob)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use 'prob.item()' to make it a float and not a torch tensor

@maxime-louis
Copy link
Collaborator

just be to usre: here Provence does not do reranking right?

it does. If reorder=True in provence config, provence reorders documents in the context processing list. But it is incorporated into context processing, so you would set retriever=<smth> context_processor=provence/provence_reranking_0.1 and skip reranker

Can you put an assert for that ?

@sclincha
Copy link
Contributor

  1. Do we want to update the requirements file for LLMLingua? Or Simply Try Catch, please install?
  2. It could be good to have a simple README for context processor explaining the list of models and basic usage such as:
    CONFIG=myconfig python bergen.py retriever=splade-v3 reranker=debertav3 +context_processor=provence/provence_standalone_0.1 dataset=$1 generator=vllm_llama-2-7b-chat
    What do you think? Where should we put that README?. The run_exp file is a bit hard to read for a first usage ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants