Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between SFTTrainer and Seq2seqTrainer #2339

Open
Hyfred opened this issue Nov 9, 2024 · 0 comments
Open

Difference between SFTTrainer and Seq2seqTrainer #2339

Hyfred opened this issue Nov 9, 2024 · 0 comments
Labels
❓ question Seeking clarification or more information 🏋 SFT Related to SFT

Comments

@Hyfred
Copy link

Hyfred commented Nov 9, 2024

The traditional approach separates the input (e.g., document) and label (e.g., summary), and the loss is calculated based on generation compared to the label. I think this refers to the Seq2seqTrainer.

And the SFTTrainer wraps the input and label together as one instruction (where input and label are the same) and trains it as a next-token prediction task. While these approaches seem similar, I wonder if there is a performance difference between the two. Is anyone have a sense of which mechanism is better suited to specific scenarios?

@qgallouedec qgallouedec added ❓ question Seeking clarification or more information 🏋 SFT Related to SFT labels Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓ question Seeking clarification or more information 🏋 SFT Related to SFT
Projects
None yet
Development

No branches or pull requests

2 participants