Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When doing low-level finetuning (without the aid of HF's SFTtrainer library, for example), you may need to be able to tokenize a string with the model's prompting format but without special characters (BOS and EOS). Mistral is the only model format in model_style.py with a distinct BOS character separate from other delimiters (such as <|im_start|> for chatml, which is not really a BOS but a structured delimiter). So, I added a template without the
BOS character. In any case, BOS characters would usually be added by the tokenizer before the prompt is fed to the model and this would therefore happen downstream from OgbujiPT.