-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty sequence when using faster-whisper's transcribe on fine-tuned model #1212
Comments
segments, info = model.transcribe('foo.wav', condition_on_previous_text=False) |
Thank you for your answer @Purfview, tried your code
Still returning empty sequence, tried with different audios which returns correctly using the transformer's library, also tried using the original ctranslate2 library with the following code : import ctranslate2
import librosa
import transformers
audio, _ = librosa.load('foo.wav', sr=16000, mono=True)
processor = transformers.WhisperProcessor.from_pretrained("mlouala/whisper-diin-v3")
inputs = processor(audio, return__tensors='np', sample_rate=16000)
features = ctranslate2.StorageView.from_array(inputs.input_features)
model = ctranslate2.models.Whisper("./whisper-din-v3", compute_type='int8')
prompt = processor.tokenizer.convert_tokens_to_ids(
[
"<|startoftranscript|>",
'<|fr|>',
'<|transcribe|>',
'<|notimestamps|>',
]
)
results = model.generate(features, [prompt])
transcription = processor.decode(results[0].sequences_ids[0])
print(transcription) And it's returning correctly but really long inference |
Try this: segments, info = model.transcribe('foo.wav', condition_on_previous_text=False, without_timestamps=True) |
Hi @Purfview, thank you, but still returning empty sequence. model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128) But I was having the following error when trying to run inference : Traceback (most recent call last):
File "/home/dev/sandbox_stt.py", line 44, in <module>
segments, info = model.transcribe(AUDIO, condition_on_previous_text=False, without_timestamps=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 887, in transcribe
) = self.detect_language(
^^^^^^^^^^^^^^^^^^^^^
File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 1764, in detect_language
encoder_output = self.encode(
^^^^^^^^^^^^
File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 1343, in encode
features = get_ctranslate2_storage(features)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 1820, in get_ctranslate2_storage
segment = ctranslate2.StorageView.from_array(segment)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Unsupported type: <f8 And I finally solved this error by adding this parameter when running the I don't know if these additional informations may help you 🤷♂... |
Hi,
I'm trying to use
faster-whisper
with a fine-tuned model of the new whisper's turbo model :openai/whisper-large-v3-turbo
Faster-whisper library
When I'm trying to run inference of my fine-tuned model with
faster-whisper
, after converting the model using this command line :Then running this script :
I tested multiple quantization (int8, int8_float32, int16) and no quantization at all but it always returns empty list of segments.
Nonetheless, it detects correctly the langage and audio's duration as you can see in the TranscriptionInfo :
Also, when I'm running the 'base' turbo model converted using ct2-transformers-converter it works fine.
Geniune Transformer library
But my model is working fine when using this simple code with geniune transformers library :
Any clues ?
The text was updated successfully, but these errors were encountered: