from chat_tokenizer import ChatTokenizer
from transformers import AutoTokenizer
tokenizer = ChatTokenizer(AutoTokenizer.from_pretrained("qwen/Qwen2.5-0.5B-Instruct"))
audio_lens = [[4, 2], 3]
labels = [["今天天气不错", "哈哈"], "你好啊"]
input_ids, input_lens, label_ids, label_lens = tokenizer.batch_tokenize(audio_lens, labels)
input_ids = tokenizer.fill_labels(label_ids, input_ids)
-
Notifications
You must be signed in to change notification settings - Fork 0
License
pengzhendong/chat-tokenizer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published