Skip to content

finetune 数据准备相关问题 #35

@liweiyong-leo

Description

@liweiyong-leo

@pengzhendong finetune代码中增加了train.jsonl,能否提供一个真实的中英文样例以及上下文和语种配置问题,因为我看这个和funasr之前提供信息有点差异。我试着去理解,尝试写了两个例子。东哥帮忙看看
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "语音转写:<|startofspeech|>!/path/to/wav<|endofspeech|>"}, {"role": "assistant", "content": "content of /path/to/wav"}], "speech_length": 42, "text_length": 42}
中文例子
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "语音转写:<|startofspeech|>!/home/zh.wav<|endofspeech|>"}, {"role": "assistant", "content": "我是中国人"}], "speech_length": 12, "text_length": 5}
英文例子
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "语音转写:<|startofspeech|>!/home/en.wav<|endofspeech|>"}, {"role": "assistant", "content": "hello world"}], "speech_length": 12, "text_length": 2}
我不知道我是否理解的对与否。另外如果我们需要添加语种和上下文信息,这个需要如何修改呢。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions