You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, data loading and loss computation assume one is only doing LM pretraining, but it'd be useful to support packed SFT style datasets (i.e. datasets with cleanly delineated prompt/completion pairs, perhaps even a system prompt) and their corresponding masking.
I.e., the masks allow the attention module to reference the prompts/prefix, but only completions/targets' gradients are propogated.
The text was updated successfully, but these errors were encountered:
Right now, data loading and loss computation assume one is only doing LM pretraining, but it'd be useful to support packed SFT style datasets (i.e. datasets with cleanly delineated prompt/completion pairs, perhaps even a system prompt) and their corresponding masking.
I.e., the masks allow the attention module to reference the prompts/prefix, but only completions/targets' gradients are propogated.
The text was updated successfully, but these errors were encountered: