Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

发现了在LLM-ASR推理中存在的一些小问题 #2407

Open
NiniAndy opened this issue Mar 5, 2025 · 1 comment
Open

发现了在LLM-ASR推理中存在的一些小问题 #2407

NiniAndy opened this issue Mar 5, 2025 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@NiniAndy
Copy link

NiniAndy commented Mar 5, 2025

❓ Questions and Help

What is your question?

在LLM-ASR任务中,我用默认whisper_qwen_linear.yaml训练aishell,训练了10个epoch,用best_model.pt进行inference。
第一次:默认whisper_qwen_linear.yaml中有SpecAugLFR,因此在inference的时候经常出现无厘头的重复,频率很高。
e.g.
BAC009S0768W0178 撇油加加撇油加加撇油加加撇油加加撇油加加撇油。。。

第二次:删掉默认whisper_qwen_linear.yaml中所有的dropout和SpecAugLFR,重新训练以后,在inference无厘头的重复出现的概率降低了,但依然偶尔会有。问题转移成在inference是会在解码结果前面多出现一两个字。我已经检查了mask似乎没有什么问题,在推理时我也尝试禁止了prompt,结果似乎也没有变化。
e.g.
BAC009S0766W0399 幢经过近两个星期的漫长等待 (经过近两个星期的漫长等待)

一些配置和sh文件我通过附件的形式发你:
conf:https://github.com/NiniAndy/FunASR/blob/mymerge/examples/industrial_data_pretraining/llm_asr/conf/whisper_qwen_linear.yaml
train.sh:
https://github.com/NiniAndy/FunASR/blob/mymerge/examples/industrial_data_pretraining/llm_asr/demo_train_or_finetune.sh
inference.sh:
https://github.com/NiniAndy/FunASR/blob/mymerge/examples/industrial_data_pretraining/llm_asr/infer_speech2text.sh

What's your environment?

  • Linux:
  • FunASR Version : 1.1.12
  • PyTorch Version : 2.4.1
  • CUDA/cuDNN version : cu12.4
@NiniAndy NiniAndy added the question Further information is requested label Mar 5, 2025
@LauraGPT LauraGPT self-assigned this Mar 5, 2025
@NiniAndy
Copy link
Author

NiniAndy commented Mar 6, 2025

问题找到了:
在用whisper_qwen_linear.yaml训练时prompt是"Transcribe speech to text."详见:
funasr/datasets/llm_datasets_qwenaudio/datasets.py 46-47
在inference时prompt默认是:
prompt_pre = "USER: \nINSTRUCTION: {}\nINPUT: ".format(prompt)

还有一个bug:
在funasr/models/llm_asr/model.py 301-303应该是写错了,应该改成:
inputs_embeds = torch.cat((encoder_out, inputs_embeds[None, :, :]), dim=1) # [audio, prompt]

@NiniAndy NiniAndy changed the title LLM-ASR推理存在一些小问题 发现了在LLM-ASR推理中存在的一些小问题 Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants