Replies: 1 comment
-
Just a heads up for anyone looking into this issue: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I'm currently experiencing some issues with hallucination, similar to what others have encountered. I'm looking for a solution to generate subtitles for a one-hour long video. I just tried some configuration, but with its limits.
According to:
Issue #896,
Pull Request #291
Whisper Thread
Context of the video:
At the start of the video, there is music playing for about 10 minutes, followed by a speech.
Custom Settings:
Beam size: 5 (-bs 5)
Entropy threshold: 2.4 (-et 2.4)
Maximum context: 64 (max-context = 64)
With this configuration, the hallucination is now limited and "only" takes 2 minutes to find the way back. Previously, I had about 60 minutes of the word "[Music]" before making the adjustments.
However, after approximately 64 spoken words, the context changes, and the model starts working fine again. But there is still around 2 minutes of hallucination during the start of the speech. Is there a way to implement a time threshold (in seconds) to establish a new context after 10-15 seconds? Or reset the context, if the temperature is on high level for x seconds?
Further can someone explain the variables? As it might help reducing hallucinations?
--word-thold N [0.01 ] word timestamp probability threshold
--entropy-thold N [2.40 ] entropy threshold for decoder fail
-logprob-thold N [-1.00 ] log probability threshold for decoder fail
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions