Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple chat templates per model #134

Merged

Conversation

DePasqualeOrg
Copy link
Contributor

Some models, such as Mistral 7B Instruct 0.3, specify an array of chat templates in tokenizer_config.json. This change allows for parsing chat templates in this format.

@DePasqualeOrg DePasqualeOrg marked this pull request as draft October 2, 2024 09:49
@DePasqualeOrg DePasqualeOrg force-pushed the improve-chat-template-parsing branch from 9c8cd5e to cc7b2bc Compare October 2, 2024 10:22
@DePasqualeOrg DePasqualeOrg force-pushed the improve-chat-template-parsing branch from 29cf6ff to 48f9167 Compare October 2, 2024 10:39
@DePasqualeOrg
Copy link
Contributor Author

I've tried to bring this more in line with the Python implementation. I'm unsure about whether tools should be included in the context dictionary.

@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review October 2, 2024 10:45
@pcuenca pcuenca changed the title Improve chat template parsing Support multiple chat templates per model Oct 2, 2024
Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR! I made a couple of comments, but it looks in good shape. Do you think it would be possible to add a simple test case that exercises this feature?

Sources/Tokenizers/Tokenizer.swift Outdated Show resolved Hide resolved
Sources/Tokenizers/Tokenizer.swift Outdated Show resolved Hide resolved
Sources/Tokenizers/Tokenizer.swift Outdated Show resolved Hide resolved
@pcuenca
Copy link
Member

pcuenca commented Oct 2, 2024

I'm unsure about whether tools should be included in the context dictionary.

What do you mean? As I understand it, you are adding support to the Jinja Swift package here, right?

@DePasqualeOrg
Copy link
Contributor Author

I'm unsure about whether tools should be included in the context dictionary.

What do you mean? As I understand it, you are adding support to the Jinja Swift package here, right?

I think that PR won't include the tools functionality, for the sake of simplicity. Later on I may try to add it.

@pcuenca
Copy link
Member

pcuenca commented Oct 2, 2024

I think that PR won't include the tools functionality

In that case, maybe we should get started with just a simple way to select the desired template (by name) and add the tool stuff later. Would that work for the purposes of ml-explore/mlx-swift-examples#135?

@DePasqualeOrg
Copy link
Contributor Author

I think this now mirrors the Python implementation more closely. It throws an error if no template is selected, which I think is better than silently falling back to a default template that might not be compatible with the model, which previously caused some confusion for me when I was debugging the problems with Mistral 7B 0.3.

@DePasqualeOrg
Copy link
Contributor Author

I've added some tests with a note to add tests for tool use templates once these are implemented. Do you think this can be merged without implementing tool use in Jinja? It would be an improvement on the current situation, in which Mistral 7B doesn't work properly.

Sources/Tokenizers/Tokenizer.swift Show resolved Hide resolved
let tokenizer = try await AutoTokenizer.from(pretrained: "microsoft/Phi-3-mini-128k-instruct")
// Purposely not using the correct template for this model to verify that the template from the config is not being used
let mistral7BDefaultTemplate = "{{bos_token}}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ ' [INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' ' + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}"
let encoded = try tokenizer.applyChatTemplate(messages: messages, chatTemplate: mistral7BDefaultTemplate, addGenerationPrompt: false, truncation: false, maxLength: nil, tools: nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate we have to add all the arguments in this version of the call, perhaps we could add a new one that supports messages and chatTemplate, if you think it's interesting to do so.

}
return (name, template)
})
if let chatTemplateArgument = chatTemplate, let matchingDictEntry = templateDict[chatTemplateArgument] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you pass an actual template as an argument to a tokenizer that has multiple ones (mlx-community/Mistral-7B-Instruct-v0.3-4bit, for example), it will be silently ignored and replaced with the default one, right? Perhaps we could fail if the tokenizer has several templates and the argument provided does not match any of them. What do you think?

Sources/Tokenizers/Tokenizer.swift Show resolved Hide resolved
@pcuenca
Copy link
Member

pcuenca commented Oct 2, 2024

I've added some tests with a note to add tests for tool use templates once these are implemented.

Thank you!

Do you think this can be merged without implementing tool use in Jinja? It would be an improvement on the current situation, in which Mistral 7B doesn't work properly.

Yes, let's merge this and we can improve later. I have a couple of final comments / questions and that's it.

/// A Jinja template or the name of a template to use for this conversion.
/// It is usually not necessary to pass anything to this argument,
/// as the model's template will be used by default.
chatTemplate: String? = nil,
addGenerationPrompt: Bool = false,
Copy link
Contributor Author

@DePasqualeOrg DePasqualeOrg Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addGenerationPrompt is false by default in this method (which mirrors the Python implementation), but it is currently set to true in the overload method. Should we make the behavior consistent in all applyChatTemplate methods? Should it be true or false by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be true for the overloads as I expect that to be the most common use of the API, but I'm not fully sure.

@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Oct 2, 2024

I added an extra argument chatTemplateName to make things more explicit. Otherwise it becomes difficult to cover edge cases. Do you think it's okay to diverge slightly from the Python implementation here?

@DePasqualeOrg
Copy link
Contributor Author

We can maintain consistency with the Python API while being more explicit using an enum for the chatTemplate argument, with options for literal and name.

@pcuenca
Copy link
Member

pcuenca commented Oct 3, 2024

using an enum

That's a great idea!

In any case, it is ok to deviate a bit from the Python API if it's useful for the project. I see it more as a reference. Also, the more we use the same criteria, the less friction people that come from Python will experience. But I think that some developers don't come from that audience.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, thanks a lot for your patience!

Sources/Tokenizers/Tokenizer.swift Outdated Show resolved Hide resolved

func applyChatTemplate(messages: [[String: String]], chatTemplateName: String) throws -> [Int]
func applyChatTemplate(messages: [[String: String]], chatTemplate: ChatTemplateArgument) throws -> [Int]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution. But it's a breaking change, do you know if the current method is used by mlx or others?

Perhaps we could add an overload applyChatTemplate(messages: [[String: String]], chatTemplateName: String) that just builds a ChatTemplateArgument (literal) and forwards the call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched the mlx-swift and mlx-swift-examples repos and didn't find any instances of applyChatTemplate there. I have a draft PR to use the chat templates once this PR and the Jinja PR land.

I think you mean the additional overload should have the argument chatTemplate: String. Sure, I can add that.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for iterating here @DePasqualeOrg 🙌

Great contribution, merging this now!

@pcuenca pcuenca merged commit 4d25d20 into huggingface:main Oct 3, 2024
1 check passed
@DePasqualeOrg
Copy link
Contributor Author

Thanks for your help! Do you want to make a new release so that this can get picked up in mlx-swift-examples for ml-explore/mlx-swift-examples#135?

@pcuenca
Copy link
Member

pcuenca commented Oct 3, 2024

Just pushed https://github.com/huggingface/swift-transformers/releases/tag/0.1.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants