Skip to content

pengzhendong/chat-tokenizer

Repository files navigation

chat-tokenizer

Usage

from chat_tokenizer import ChatTokenizer
from transformers import AutoTokenizer


tokenizer = ChatTokenizer(AutoTokenizer.from_pretrained("qwen/Qwen2.5-0.5B-Instruct"))

audio_lens = [[4, 2], 3]
labels = [["今天天气不错", "哈哈"], "你好啊"]
input_ids, input_lens, label_ids, label_lens = tokenizer.batch_tokenize(audio_lens, labels)
input_ids = tokenizer.fill_labels(label_ids, input_ids)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages