Empty sequence when using faster-whisper's transcribe on fine-tuned model #1212

mlouala-dev · 2024-12-22T06:55:11Z

Hi,
I'm trying to use faster-whisper with a fine-tuned model of the new whisper's turbo model : openai/whisper-large-v3-turbo

Faster-whisper library

When I'm trying to run inference of my fine-tuned model with faster-whisper, after converting the model using this command line :

ct2-transformers-converter --model "mlouala/whisper-diin-v3" --output_dir "whisper-din-v3" --force  --copy_files tokenizer_config.json preprocessor_config.json --quantization int8

Then running this script :

from faster_whisper import WhisperModel
model_size = "/home/dev/whisper-din-v3"
model = WhisperModel(model_size, device="cuda")

segments, info = model.transcribe('foo.wav', beam_size=5)
for segment in segments:
    print(dict(start=segment.start, end=segment.end, text=segment.text))

I tested multiple quantization (int8, int8_float32, int16) and no quantization at all but it always returns empty list of segments.
Nonetheless, it detects correctly the langage and audio's duration as you can see in the TranscriptionInfo :

TranscriptionInfo(language='fr', language_probability=0.8290529251098633, duration=8.2, duration_after_vad=8.2, all_language_probs=[....], transcription_options=TranscriptionOptions(beam_size=5, best_of=5, patience=1, length_penalty=1, repetition_penalty=1, no_repeat_ngram_size=0, log_prob_threshold=-1.0, no_speech_threshold=0.6, compression_ratio_threshold=2.4, condition_on_previous_text=True, prompt_reset_on_temperature=0.5, temperatures=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0], initial_prompt=None, prefix=None, suppress_blank=True, suppress_tokens=(1, 2, 7, 8, 9, 10, 14, 25, 26, 27, 28, 29, 31, 58, 59, 60, 61, 62, 63, 90, 91, 92, 93, 359, 503, 522, 542, 873, 893, 902, 918, 922, 931, 1350, 1853, 1982, 2460, 2627, 3246, 3253, 3268, 3536, 3846, 3961, 4183, 4667, 6585, 6647, 7273, 9061, 9383, 10428, 10929, 11938, 12033, 12331, 12562, 13793, 14157, 14635, 15265, 15618, 16553, 16604, 18362, 18956, 20075, 21675, 22520, 26130, 26161, 26435, 28279, 29464, 31650, 32302, 32470, 36865, 42863, 47425, 49870, 50254, 50258, 50358, 50359, 50360, 50361), without_timestamps=False, max_initial_timestamp=1.0, word_timestamps=False, prepend_punctuations='"\'“¿([{-', append_punctuations='"\'.。,，!！?？:：”)]}、', multilingual=False, max_new_tokens=None, clip_timestamps=[0.0], hallucination_silence_threshold=None, hotwords=None), vad_options=None)

Also, when I'm running the 'base' turbo model converted using ct2-transformers-converter it works fine.

Geniune Transformer library

But my model is working fine when using this simple code with geniune transformers library :

from transformers import pipeline
pipe = pipeline(model="mlouala/whisper-diin-v3")

def transcribe(audio):
    text = pipe(audio)["text"]
    return text
print(transcribe('foo.wav'))

Any clues ?

The text was updated successfully, but these errors were encountered:

Purfview · 2024-12-22T13:51:37Z

segments, info = model.transcribe('foo.wav', condition_on_previous_text=False)

mlouala-dev · 2024-12-22T15:26:29Z

Thank you for your answer @Purfview, tried your code

segments, info = model.transcribe('foo.wav', condition_on_previous_text=False)

Still returning empty sequence, tried with different audios which returns correctly using the transformer's library, also tried using the original ctranslate2 library with the following code :

import ctranslate2
import librosa
import transformers

audio, _ = librosa.load('foo.wav', sr=16000, mono=True)
processor = transformers.WhisperProcessor.from_pretrained("mlouala/whisper-diin-v3")
inputs = processor(audio, return__tensors='np', sample_rate=16000)

features = ctranslate2.StorageView.from_array(inputs.input_features)
model = ctranslate2.models.Whisper("./whisper-din-v3", compute_type='int8')

prompt = processor.tokenizer.convert_tokens_to_ids(
    [
        "<|startoftranscript|>",
        '<|fr|>',
        '<|transcribe|>',
        '<|notimestamps|>',
    ]
)

results = model.generate(features, [prompt])
transcription = processor.decode(results[0].sequences_ids[0])
print(transcription)

And it's returning correctly but really long inference

Purfview · 2024-12-22T18:33:01Z

Try this:

segments, info = model.transcribe('foo.wav', condition_on_previous_text=False, without_timestamps=True)

mlouala-dev · 2024-12-22T20:01:36Z

Hi @Purfview, thank you, but still returning empty sequence.
By the way, I was originally having an issue loading the model like this #582 and first tried to solve the issue with this line :

model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)

But I was having the following error when trying to run inference :

Traceback (most recent call last):
  File "/home/dev/sandbox_stt.py", line 44, in <module>
    segments, info = model.transcribe(AUDIO, condition_on_previous_text=False, without_timestamps=True)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 887, in transcribe
    ) = self.detect_language(
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 1764, in detect_language
    encoder_output = self.encode(
                     ^^^^^^^^^^^^
  File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 1343, in encode
    features = get_ctranslate2_storage(features)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/miniconda3/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 1820, in get_ctranslate2_storage
    segment = ctranslate2.StorageView.from_array(segment)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Unsupported type: <f8

And I finally solved this error by adding this parameter when running the ct2-transformers-converter command line : --copy_files tokenizer_config.json preprocessor_config.json.

I don't know if these additional informations may help you 🤷‍♂...

mlouala-dev changed the title ~~Empty sequence when using~~ Empty sequence when using faster-whisper inference on fine-tuned model Dec 22, 2024

mlouala-dev changed the title ~~Empty sequence when using faster-whisper inference on fine-tuned model~~ Empty sequence when using faster-whisper's transcribe on fine-tuned model Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty sequence when using faster-whisper's transcribe on fine-tuned model #1212

Empty sequence when using faster-whisper's transcribe on fine-tuned model #1212

mlouala-dev commented Dec 22, 2024

Purfview commented Dec 22, 2024

mlouala-dev commented Dec 22, 2024 •

edited

Loading

Purfview commented Dec 22, 2024

mlouala-dev commented Dec 22, 2024

Empty sequence when using faster-whisper's transcribe on fine-tuned model #1212

Empty sequence when using faster-whisper's transcribe on fine-tuned model #1212

Comments

mlouala-dev commented Dec 22, 2024

Faster-whisper library

Geniune Transformer library

Purfview commented Dec 22, 2024

mlouala-dev commented Dec 22, 2024 • edited Loading

Purfview commented Dec 22, 2024

mlouala-dev commented Dec 22, 2024

mlouala-dev commented Dec 22, 2024 •

edited

Loading