Transcription cut randomly when using a translated model. #1363

bansal-sid · 2025-09-17T05:23:14Z

bansal-sid
Sep 17, 2025

Hello,

I used a fine-tuned model for Telugu language. The model is: "vasista22/whisper-telugu-base"

Since the model had to be translated first, I did so as per the guidelines mentioned using ctranslate2.

Just for reference, I translated the model using the following code:

converter = ctranslate2.converters.TransformersConverter(
    saved_model_path
)
converter.convert("translated_model_telugu", force=True)  # Output dir for CTranslate2
## saved_model_path is the telugu model path stored locally.

Now when I tried to transcribe using this translated model, for all the files, partial transcription was gneerated. I had some audio files of approx 30 seconds.
Transcription would be generated for like only first 10-15 seconds, and nothing for the rest of the audio. Whatever transcription was generated was very much accurate.

The main thing that concerns me is that the same fine-tuned model, when I use it to transcribe in its original form i.e. without translating, I'm able to generate full transcriptions.
Accurate transcription code:

from transformers import pipeline
transcribe = pipeline(
    task="automatic-speech-recognition",
    model="vasista22/whisper-telugu-base",
    chunk_length_s=30,
    device="cuda",
)

# Set language + task properly
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(
    language="te", task="transcribe"
)
# Run batch transcription
results = transcribe(audio_files, batch_size=8)  # 🔑 batch inference

This behaviour was very strange to me given that, for English language, I didn't face any issues with the translated model using same steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transcription cut randomly when using a translated model. #1363

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Transcription cut randomly when using a translated model. #1363

Uh oh!

bansal-sid Sep 17, 2025

Replies: 0 comments

bansal-sid
Sep 17, 2025