-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CGPO] Mixture of judges #2159
base: main
Are you sure you want to change the base?
[CGPO] Mixture of judges #2159
Conversation
|
Thanks a lot @gaetanlop. Added some suggestion and open questions |
Also, please make sure to run the pre-commits ( |
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Hey @qgallouedec, thanks for the review. I have added a For the naming of the judges, let's do |
Thanks a lot! Nice work!
LGTM. WDYT of having generic classes in |
Ok, I have removed the |
trl/trainer/judges.py
Outdated
|
||
@abstractmethod | ||
def judge( | ||
self, prompts: List[str], completions: List[str], gold_answers: List[str] = None, shuffle_order: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "gold_completion"
or even "ref_completion"
would be more suited, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"gold_completions"
seems good to me. I have done the update.
Co-authored-by: Quentin Gallouédec <[email protected]>
trl/trainer/judges.py
Outdated
completions: List[str], | ||
gold_completions: Optional[List[str]] = None, | ||
shuffle_order: bool = True, | ||
) -> List[bool]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return a list of int to be consistent with the super class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
What does this PR do?
This PR adds the Mixture of judges part of the CGPO paper (https://arxiv.org/pdf/2409.20370). The judges are described in section 4.1.4 and the mixture of judges simply labels a generation as “violated” (0) if it fails any one of the constraint judgments and “satisfied” (1) otherwise.
Related to #2156
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
@kashif @lewtun