retry on leftover audio false mechanism #1395

rjames-0 · 2025-11-12T14:34:11Z

hallucination fix from @ArneNx to prevent faster whisper from regenerating text on leftover audio when set to return word level timestamps. Mechanism introduces retry_on_leftover_audio option which when set to false skips the processing of leftover audio segment.

MahmoudAshraf97 · 2025-11-19T14:14:31Z

Hi, how is this solving hallucination and can you give examples?

ArneNx · 2025-11-28T13:54:52Z

We would generally see lots of hallucinations when we enabled word-level timestamps. The reason is that Whisper would get presented the leftover audio (that it already decided not to transcribe) without context again. This would frequently cause hallucinations since whisper tends to always output something when presented with audio, even if that audio doesn't contain any speech. I guess there may be use-cases where this retry mechanism makes sense, but at least from our experience it seems to hurt more than it helps.

Purfview · 2025-11-29T02:36:51Z

We would generally see lots of hallucinations when we enabled word-level timestamps.

Do you use VAD?

The reason is that Whisper would get presented the leftover audio (that it already decided not to transcribe) without context again.

This is original Whisper behaviour. I don't remember now why it doesn't seek to the full segment boundary there, I think there must be reason for that.

@MahmoudAshraf97 Do you know why?

Purfview · 2025-11-29T04:32:30Z

Just tested it. The PR has opposite effect. It creates "hallucinations" at the boundaries - transcribes the last sentences from previous segments and the timestimes at the start of segments leaks into the previous segments too.

The PR looks invalid to me.

@rjames-0 @ArneNx
If you have problems with hallucinations then use VAD and/or hallucination_silence_threshold.

MahmoudAshraf97 · 2025-11-29T07:40:23Z

We would generally see lots of hallucinations when we enabled word-level timestamps.

Do you use VAD?

The reason is that Whisper would get presented the leftover audio (that it already decided not to transcribe) without context again.

This is original Whisper behaviour. I don't remember now why it doesn't seek to the full segment boundary there, I think there must be reason for that.

@MahmoudAshraf97 Do you know why?

sequential whisper starts the next segment from the end of the previous one, so if there is some overlap between segments, the thing is, if word timestamps are used, the segment boundaries are updated using the start of the first and the end of the last word, I have not verified this behavior with the reference implementation

This problem mentioned in the PR is valid but the solution is wrong imho, VAD is a way cleaner solution

Purfview · 2025-11-29T13:33:25Z

I have not verified this behavior with the reference implementation

There it's: https://github.com/openai/whisper/blob/c0d2f624c09dc18e709e37c2ad90c039a4eb72a2/whisper/transcribe.py#L413-L416

Disabling that block creates quirks like this (a segment end is after the first line in the screenshot) [no VAD]:

retry on leftover audio false mechanism

e6293c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

retry on leftover audio false mechanism #1395

retry on leftover audio false mechanism #1395

rjames-0 commented Nov 12, 2025

Uh oh!

MahmoudAshraf97 commented Nov 19, 2025

Uh oh!

ArneNx commented Nov 28, 2025

Uh oh!

Purfview commented Nov 29, 2025 •

edited

Loading

Uh oh!

Purfview commented Nov 29, 2025 •

edited

Loading

Uh oh!

MahmoudAshraf97 commented Nov 29, 2025

Uh oh!

Purfview commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

retry on leftover audio false mechanism #1395

Are you sure you want to change the base?

retry on leftover audio false mechanism #1395

Conversation

rjames-0 commented Nov 12, 2025

Uh oh!

MahmoudAshraf97 commented Nov 19, 2025

Uh oh!

ArneNx commented Nov 28, 2025

Uh oh!

Purfview commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Purfview commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MahmoudAshraf97 commented Nov 29, 2025

Uh oh!

Purfview commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Purfview commented Nov 29, 2025 •

edited

Loading

Purfview commented Nov 29, 2025 •

edited

Loading