Describe the bug
When attempting to run inference using the pretrained nvidia/nemotron-3.5-asr-streaming-0.6b model via ASRModel.from_pretrained, a ValueError is raised stating "Unknown prompt key: 'None'". The model expects a specific language prompt/key to be passed or configured, but standard usage of .transcribe() defaults to None, causing the transcription to fail out of the box. Moreover, https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b page doesn't provide any instruction on how to pass the target_lang (including how to use the auto option).
Steps/Code to reproduce bug
Steps:
Install NeMo and its ASR dependencies.
Load the model nvidia/nemotron-3.5-asr-streaming-0.6b using ASRModel.from_pretrained().
Call transcribe() on a valid WAV file.
Observe the exception:
ValueError: Unknown prompt key: 'None'. Available: ['en-US', 'en', 'en-GB', 'enGB', 'es-ES', 'esES', 'es-US', 'es', 'zh-CN', 'zh-ZH']...
Expected behavior
The model should either default to a reasonable fallback language prompt (e.g., 'en') when no prompt is provided, or the .transcribe() method/documentation should clearly indicate how to supply the mandatory prompt key argument for this specific streaming model architecture so it doesn't fail with None.
Environment overview
Environment details
- OS version: Ubuntu 22.04
- PyTorch version: 2.12.1
- Python version: 3.12.0
Additional context
The nvidia/nemotron-3.5-asr-streaming-0.6b model is a prompt-conditioned multilingual streaming ASR model. The underlying error seems to stem from the model's expectation of a language/prompt token during the forward pass, which is not being implicitly handled or exposed cleanly through the standard asr_model.transcribe() pipeline. It is possible for you to share code examples for this model where target_lang is specified or auto option is used.
Describe the bug
When attempting to run inference using the pretrained nvidia/nemotron-3.5-asr-streaming-0.6b model via ASRModel.from_pretrained, a ValueError is raised stating "Unknown prompt key: 'None'". The model expects a specific language prompt/key to be passed or configured, but standard usage of .transcribe() defaults to None, causing the transcription to fail out of the box. Moreover, https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b page doesn't provide any instruction on how to pass the target_lang (including how to use the auto option).
Steps/Code to reproduce bug
Steps:
Install NeMo and its ASR dependencies.
Load the model nvidia/nemotron-3.5-asr-streaming-0.6b using ASRModel.from_pretrained().
Call transcribe() on a valid WAV file.
Observe the exception:
ValueError: Unknown prompt key: 'None'. Available: ['en-US', 'en', 'en-GB', 'enGB', 'es-ES', 'esES', 'es-US', 'es', 'zh-CN', 'zh-ZH']...
Expected behavior
The model should either default to a reasonable fallback language prompt (e.g., 'en') when no prompt is provided, or the .transcribe() method/documentation should clearly indicate how to supply the mandatory prompt key argument for this specific streaming model architecture so it doesn't fail with None.
Environment overview
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]
" as described in https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b
Environment details
Additional context
The nvidia/nemotron-3.5-asr-streaming-0.6b model is a prompt-conditioned multilingual streaming ASR model. The underlying error seems to stem from the model's expectation of a language/prompt token during the forward pass, which is not being implicitly handled or exposed cleanly through the standard asr_model.transcribe() pipeline. It is possible for you to share code examples for this model where target_lang is specified or auto option is used.