Difference between SFTTrainer and Seq2seqTrainer #2339

Hyfred · 2024-11-09T18:11:47Z

The traditional approach separates the input (e.g., document) and label (e.g., summary), and the loss is calculated based on generation compared to the label. I think this refers to the Seq2seqTrainer.

And the SFTTrainer wraps the input and label together as one instruction (where input and label are the same) and trains it as a next-token prediction task. While these approaches seem similar, I wonder if there is a performance difference between the two. Is anyone have a sense of which mechanism is better suited to specific scenarios?

qgallouedec added ❓ question Seeking clarification or more information 🏋 SFT Related to SFT labels Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between SFTTrainer and Seq2seqTrainer #2339

Difference between SFTTrainer and Seq2seqTrainer #2339

Hyfred commented Nov 9, 2024

Difference between SFTTrainer and Seq2seqTrainer #2339

Difference between SFTTrainer and Seq2seqTrainer #2339

Comments

Hyfred commented Nov 9, 2024