eric-mitchell / direct-preference-optimization Public

Notifications You must be signed in to change notification settings
Fork 190
Star 2.3k

Code
Issues 41
Pull requests 2
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: eric-mitchell/direct-preference-optimization

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

41 Open 43 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

May I change the reward function by adding the chosen score and the rejection score inside.

#93 opened Dec 18, 2024 by harrysyz99

When trying to reproduce the complete example, "NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet" is thrown

#91 opened Nov 12, 2024 by ZSvedic

ValueError when using peft on FSDPTrainer

#90 opened Nov 5, 2024 by AragornHorse

In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’

#89 opened Sep 23, 2024 by Alan-D-Chen

GPT4 prompt when evaluating DPO

#88 opened Sep 5, 2024 by kygguo

How to gurantee the output.logits.shape[:-1] == labels.shape

#87 opened Aug 13, 2024 by foreverhell

How are evals done on trained models?

#83 opened May 22, 2024 by lesnikow

where is config document of ipo?

#81 opened May 7, 2024 by 3244we

Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks

#80 opened May 4, 2024 by Jayant1234

Weird logits and model starts degeneration while training DPO

#77 opened Apr 9, 2024 by DungNasSa10

Was it your intention to recreate wandb tables in iterator?

#76 opened Apr 4, 2024 by huskydoge

Can DPO work on BERT-style Model?

#75 opened Mar 24, 2024 by Leo-T-Zang

The number of training steps in the SHP dataset

#73 opened Mar 16, 2024 by bonin147

Computing faster lopgs

#72 opened Mar 9, 2024 by alexvishnevskiy

Implementation for Plackett-Luce rank model

#71 opened Mar 4, 2024 by rohan598

What's the reference policy of Preferred-FT in Figure 2?

#70 opened Mar 4, 2024 by zetian1025

My Code to Reproduce IMDB

#69 opened Feb 26, 2024 by QiyaoWei

Why does SFT sum the cross-entropy loss within each sequence?

#68 opened Feb 17, 2024 by YJWon99

Using cross entropy loss to calculate DPO?

#67 opened Feb 14, 2024 by zachares

Unable to Run SFT

#66 opened Feb 13, 2024 by Rui-Yuan91

Question bout IPO loss vs DPO loss

#64 opened Jan 30, 2024 by MoonBlvd

Reproducing Win Rate inference for TL;DR

#62 opened Jan 9, 2024 by jdchang1

DPO did not achieve the expected experimental effect

#56 opened Dec 7, 2023 by Vance0124

How to re-implement the result of IMDB sentiment generation.

#54 opened Nov 14, 2023 by junkangwu

Llama-2-13b-chat Valid reward accuracy remains ~50%

#53 opened Nov 6, 2023 by nxphi47

Previous 1 2 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly