Testing short transcriptions speed issue #956
Unanswered
timwillhack
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I was trying to put faster-whisper in place of openai-whisper in my project to get speed gains. I am mostly transcribing small wav files (1-5 second) and for some reason, the openai version of whisper is running faster than faster-whisper.
I'm running on gpu (3080).
I didn't want to open an issue about this because I most likely am missing something (like how the models compare between openai whisper and faster-whisper.
Here is my sample code that transcribes the same 2 second audio file 10 times for each version of whisper:
Sorry edited a bunch of times trying to get the code block to format but its not:
from faster_whisper import WhisperModel#, BatchedInferencePipeline model_size = "base" print("loading: " + model_size) audio_model = WhisperModel(model_size, device="cuda", compute_type="float16") for i in range(1, 10): start_timer() segments, info = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3", beam_size=5,language="en",temperature=0,compression_ratio_threshold=None,log_prob_threshold=None,no_speech_threshold=None) print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) end_timer() import whisper #using original openai whisper (base) model = "base" if args.model != "large" and not args.non_english: model = model + ".en" print("loading: " + model) audio_model = whisper.load_model(model) decoding_options = DecodingOptions(temperature=0, language="en", fp16=torch.cuda.is_available()) transcribe_params = { "no_speech_threshold": None, "compression_ratio_threshold": None, "logprob_threshold": None#, } all_params = {**vars(decoding_options), **transcribe_params} for i in range(1, 10): start_timer() result = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3",**all_params) end_timer()
This is what is output (top is faster-whisper):
`loading: base.en
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 477.35 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 236.15 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 230.88 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 228.61 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 232.33 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 235.58 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 233.37 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 231.14 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 231.20 ms
loading: base.en
Elapsed time for 'default': 702.27 ms
Elapsed time for 'default': 180.99 ms
Elapsed time for 'default': 206.93 ms
Elapsed time for 'default': 184.45 ms
Elapsed time for 'default': 182.56 ms
Elapsed time for 'default': 181.67 ms
Elapsed time for 'default': 191.69 ms
Elapsed time for 'default': 182.79 ms
Elapsed time for 'default': 179.49 ms`
Beta Was this translation helpful? Give feedback.
All reactions