Skip to content

Question about the prompt format in nemotron-3.5-asr-streaming-0.6b #15784

@MahmoudAshraf97

Description

@MahmoudAshraf97

Hello,
I see that the prompt format used for language IDs in nvidia/nemotron-3.5-asr-streaming-0.6b is that num_prompt features representing a one hot encoded language ID is concatenated to each time step of the encoder output then goes through a projection module

I wonder why that approach was used as opposed to a standard approach used in transformer decoders, which is to pass the language id token along with BOS token to the decoder

This is the approach used in Whisper and Canary, and I don't see why it can't be used with an RNN-T decoder

Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions