Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support target masking (aka loss masking or label masking) for SFT datasets #736

Open
jmschndev opened this issue Jun 28, 2024 · 0 comments
Assignees

Comments

@jmschndev
Copy link

Right now, data loading and loss computation assume one is only doing LM pretraining, but it'd be useful to support packed SFT style datasets (i.e. datasets with cleanly delineated prompt/completion pairs, perhaps even a system prompt) and their corresponding masking.

I.e., the masks allow the attention module to reference the prompts/prefix, but only completions/targets' gradients are propogated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants