-
Notifications
You must be signed in to change notification settings - Fork 854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add with_sequence
for decode stream
#1725
base: main
Are you sure you want to change the base?
Conversation
with_sequence
for decode stream
Thank you for this @ArthurZucker! What do you think about having a version of I'm also thinking through how this would be used in practice. For very long prompts, we ideally don't want to decode the whole thing since we would typically already have just tokenized the text prompt. But we need the last couple of prompt tokens to ensure we can continue the prompt text cleanly such that the concatenation of the first streamed string with the original prompt is exactly equal to all of the tokens being decoded together. Perhaps that's up to the user of the API to sort out, but it might be nice for the prefilled tokens to be excluded from the subsequent step output (or at least have the option for that). |
For sure! I am actually a lot less familiar than you about the actual use-cases! Super thankful for the feedback! |
otherwise we consider is as a string pattern. For example `pattern="|"` | ||
means you want to split on `|` (imagine a csv file for example), while | ||
`pattern=tokenizers.Regex("1|2")` means you split on either '1' or '2'. | ||
`patter=tokenizer.Regex("1|2")` means you split on either '1' or '2'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`patter=tokenizer.Regex("1|2")` means you split on either '1' or '2'. | |
`pattern=tokenizer.Regex("1|2")` means you split on either '1' or '2'. |
No description provided.