Testing short transcriptions speed issue #956

timwillhack · 2024-08-10T23:59:47Z

timwillhack
Aug 10, 2024

Hello, I was trying to put faster-whisper in place of openai-whisper in my project to get speed gains. I am mostly transcribing small wav files (1-5 second) and for some reason, the openai version of whisper is running faster than faster-whisper.

I'm running on gpu (3080).

I didn't want to open an issue about this because I most likely am missing something (like how the models compare between openai whisper and faster-whisper.

Here is my sample code that transcribes the same 2 second audio file 10 times for each version of whisper:

from faster_whisper import WhisperModel#, BatchedInferencePipeline
model_size = "base"
print("loading: " + model_size)
audio_model = WhisperModel(model_size, device="cuda", compute_type="float16")
for i in range(1, 10):
start_timer()
segments, info = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3", beam_size=5,language="en",temperature=0,compression_ratio_threshold=None,log_prob_threshold=None,no_speech_threshold=None)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
end_timer()
import whisper
#using original openai whisper (base)
model = "base"
if args.model != "large" and not args.non_english:
model = model + ".en"
print("loading: " + model)
audio_model = whisper.load_model(model)
decoding_options = DecodingOptions(temperature=0, language="en", fp16=torch.cuda.is_available())
transcribe_params = {
"no_speech_threshold": None,
"compression_ratio_threshold": None,
"logprob_threshold": None#,
}
all_params = {**vars(decoding_options), **transcribe_params}
for i in range(1, 10):
start_timer()
result = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3",**all_params)
end_timer()

Sorry edited a bunch of times trying to get the code block to format but its not:
from faster_whisper import WhisperModel#, BatchedInferencePipeline model_size = "base" print("loading: " + model_size) audio_model = WhisperModel(model_size, device="cuda", compute_type="float16") for i in range(1, 10): start_timer() segments, info = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3", beam_size=5,language="en",temperature=0,compression_ratio_threshold=None,log_prob_threshold=None,no_speech_threshold=None) print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) end_timer() import whisper #using original openai whisper (base) model = "base" if args.model != "large" and not args.non_english: model = model + ".en" print("loading: " + model) audio_model = whisper.load_model(model) decoding_options = DecodingOptions(temperature=0, language="en", fp16=torch.cuda.is_available()) transcribe_params = { "no_speech_threshold": None, "compression_ratio_threshold": None, "logprob_threshold": None#, } all_params = {**vars(decoding_options), **transcribe_params} for i in range(1, 10): start_timer() result = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3",**all_params) end_timer()

This is what is output (top is faster-whisper):
`loading: base.en
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 477.35 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 236.15 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 230.88 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 228.61 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 232.33 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 235.58 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 233.37 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 231.14 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 231.20 ms

loading: base.en
Elapsed time for 'default': 702.27 ms
Elapsed time for 'default': 180.99 ms
Elapsed time for 'default': 206.93 ms
Elapsed time for 'default': 184.45 ms
Elapsed time for 'default': 182.56 ms
Elapsed time for 'default': 181.67 ms
Elapsed time for 'default': 191.69 ms
Elapsed time for 'default': 182.79 ms
Elapsed time for 'default': 179.49 ms`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing short transcriptions speed issue #956

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Testing short transcriptions speed issue #956

Uh oh!

Uh oh!

timwillhack Aug 10, 2024

Replies: 0 comments

timwillhack
Aug 10, 2024