You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The traditional approach separates the input (e.g., document) and label (e.g., summary), and the loss is calculated based on generation compared to the label. I think this refers to the Seq2seqTrainer.
And the SFTTrainer wraps the input and label together as one instruction (where input and label are the same) and trains it as a next-token prediction task. While these approaches seem similar, I wonder if there is a performance difference between the two. Is anyone have a sense of which mechanism is better suited to specific scenarios?
The text was updated successfully, but these errors were encountered:
The traditional approach separates the input (e.g., document) and label (e.g., summary), and the loss is calculated based on generation compared to the label. I think this refers to the Seq2seqTrainer.
And the SFTTrainer wraps the input and label together as one instruction (where input and label are the same) and trains it as a next-token prediction task. While these approaches seem similar, I wonder if there is a performance difference between the two. Is anyone have a sense of which mechanism is better suited to specific scenarios?
The text was updated successfully, but these errors were encountered: