Question about the prompt format in `nemotron-3.5-asr-streaming-0.6b`

Hello,
I see that the prompt format used for language IDs in `nvidia/nemotron-3.5-asr-streaming-0.6b` is that `num_prompt` features representing a one hot encoded language ID is concatenated to each time step of the encoder output then goes through a projection module

I wonder why that approach was used as opposed to a standard approach used in transformer decoders, which is to pass the language id token along with `BOS` token to the decoder

This is the approach used in Whisper and Canary, and I don't see why it can't be used with an RNN-T decoder

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the prompt format in `nemotron-3.5-asr-streaming-0.6b` #15784

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question about the prompt format in nemotron-3.5-asr-streaming-0.6b #15784

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Question about the prompt format in `nemotron-3.5-asr-streaming-0.6b` #15784