Greater error when converted via ctranslate2 #1168

hforghani · 2024-11-23T05:29:49Z

I fine-tuned a Whisper large-v3 model via speechbrain framework. I want to convert it to faster-whisper model and run inference on it via faster-whisper==1.0.3. For this sake I first saved the model and weights:

from speechbrain.inference.ASR import WhisperASR
from transformers import WhisperProcessor

model  = WhisperASR.from_hparams(
            source="path/to/speechbrain/model",
            hparams_file="hyperparams.yaml",
            savedir='tmp_whisper',
            run_opts={"device": "cuda"}
            )
model.mods.whisper.model.save_pretrained("tmp_whisper_finetuned")
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v3")
processor.save_pretrained("tmp_whisper_finetuned")

Then I converted the model via ctranslate2==4.5.0 to faster-whisper format following this instruction in fp16 quantization:

python -m ctranslate2.converters.transformers --model tmp_whisper_finetuned  --output_dir tmp_whisper_ft_ctranslate2   --copy_files tokenizer_config.json preprocessor_config.json --quantization float16

After that I ran inference on it:

from faster_whisper import WhisperModel

model = WhisperModel("tmp_whisper_ft_ctranslate2", device='cuda')
segments, info = model.transcribe(voice_file, language="fa")

I ran this inference on a dataset containing 400 samples and averaged WER and CER. But I received greater errors than speechbrain:

Model	Platform	quantization	Time (s)	WER	CER	Max Memory (MB)
My fine-tuned model	Speechbrain	-	890	0.1495	0.0309	8182
My fine-tuned model	Faster-whisper	fp16	491	0.2436	0.1022	~5300
Whisper-large-v3	Openai-whisper	-	1185	0.2570	0.0705	9948
Whisper-large-v3	Faster-whisper	fp16	536	0.2491	0.0647	~4300

Why the converted model in faster-whisper format obtains far greater error rates than speechbrain? You may think it is due to quantization fp16 but the base model Whisper-large-v3 with the same quantization on faster-whisper gains almost equal error rates in comparison with openai-whisper.

The text was updated successfully, but these errors were encountered:

hforghani · 2024-11-23T05:54:45Z

I add some sample transcriptions. My fine-tuned model on faster-whisper has considerable hallucination. The bold text shows hallucination.

Reference Sentence	Speechbrain Output	Faster-whisper Output
خالقی که با جلوه ی سراسر نورانی خود عوالم غیب و شهادت و سر و علن را به نعمت وجود اراسته و به برکت برگزیدگانش به ما رسانده که الله نور السموات و الارض و با ظهور جمیلش پرده از جمالش برافکنده که هو الاول و الاخر و الظاهر و الباطن و به کتب مقدس اسمانی اش که از حضرت غیب بر انبیایش از صفی الله تا خلیل الله و از خلیل الله تا حبیب الله صلوات الله و سلامه علیهم و سلم نازل فرموده راه وصول به کمالات و فنای در کمال مطلق را تعلیم فرموده و سلوک الی الله را گوشزد کرده چون کریمه ی و من یخرج من بیته مهاجرا الی الله و طریق برخورد با مومنین و دوستان خود و ملحدین و مستکبرین و دشمنان خویش را اموخته محمد رسول الله و الذین معه اشداء علی الکفار رحماء بینهم و هزاران شکر که ما را از امت خاتم النبین محمد مصطفی صلی الله علیه و اله و سلم قرار داد افضل و اشرف موجودات و از پیروان قران مجید اعظم و اشرف کتب مقدسه و صورت کتیبه ی حضرت غیب مستجمع جمیع کمالات به صورت وحدت جمیعه و ضمانت حفظ و صیانت ان را از دستبرد شیاطین انس و جن فرموده انا نحن نزلنا الذکر و انا له لحافظون قرانی که نه یک حرف بر ان افزوده شده و نه یک حرف کاسته	خالقی که با جلوه سراسر نورانی خود عوالم غیب و شهادت و سر و علن را به نعمت وجود اراسته و به برکت برگزیدگانش به ما رسانده که الله و نور و سماوات والارض و با ظهور جمیلش پرده از جمالش برافکنده که هو والاول والاخر و ظاهر و باطن و به کتب مقدس اسمانی اش که از حضرت قیر وی بر انبیاش از صفی الله تا خلیل الله و از خلیل الله تا حبیب الله ص نازل فرموده راه وصول به کمالات و فنای در کمال مطلق را تعلیم فرموده و سلوک الی الله را گوشزد کرده چون کریمه و منیخروج من بیته ای مهاجرند الی الله و طریق برخورد برد با مومنین و دوستان خود و ملحدین و مستکبرین و دشمنان خود را اموخته محمد رسول الله و لزینم اهو اشدع علی الکفار رحم اع بین هم و هزاران شکر که ما را از امت خاتم النبیین محمد مصطفی ص قرار داد افضل و اشرف موجودات و از پیروان قدرت قران مجید اعظم و اشرف کتب مقدسه و صورت کتیبه حضرت غیب مستجمع جمیع کمالات به صورت وحدت جمیعه و ضمانت حفظ و صیانت ان را از دستبرد شیاطین انس و جن فرموده انا نه نو نه ظل نه ذکر و انا لهو لحافظون قرانی که نه یک حرف بر ان افزوده شده و نه یک حرف رفع کار سته	قی که با جلوه سراسرنورانی خود عوالم غیب و شهادت و صر و علن را به نعمت وجود اراسته و به برکت برگزیدگانش به ما رسانده که الله و نور و سماوات والعرض و با ظهور جمیله اش پرده از جمالش برفکنده که هو الاول والاخر و ظاهر ولباطن و به کتاب مقدس اسمانی اش یب سعیب ان وح ورزن رایبیبیبیبی برک ویبیبیبیبیبی بر ویبیبیبیبی بر ویبیبی بیبی بی بیبی بیبی بی بی بی بیبی بیبی بی بی بی بیبی بی بی بی بی بیبی بی بی بی بیبی بی بی بی بی بی بیبی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی بی برد با مومنین و دوستان خود و ملحدین و مستکبرین و دشمنان خود را اموخته محمد رسول الله و لزی نعمه اهو الشداع علی الکفار رهماع بین هم و هزاران شوکش که ما را از امت خاتم النبیین محمد مصطفی ص قرار داد افضل و اشرف موجودات و پیروان قدرت وی و زرح مق و که پسپت و ازنه بر و س را و یک وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی وی ن دی وی وی وی وی وی وی وی وی وی وی وی وی نی در کار سته
عبدالله نصیری در گفت ؤگو با خبرگزاری دانشجویان ایران ایسنا با بیان اینکه این تاخیر جزیی است افزود تمام قراردادهای عمره ی سال اینده با وزیر حج سابق عربستان منعقد و جداول پروازی نیز مشخص شده بود که با وجود تغییرات اخیر باید تمام قراردادها از سوی کمیته ی ملی عمره تایید شود	عبدالله نصیری در گفتگو با خبرگزاری دانشجویان ایران ایسنا با بیان این که این تاخیر جزیی است افزود تمام قراردادهای عمری سال اینده با وزیر حج سابق عربستان منعقد و جداول پروازی نیز مشخص شده بود که با وجود تغییرات اخیر باید تمام قراردادها از سوی کمیته ملی ملی عمره تایید شود	نصری در گفت وگو با خبرگزاری دانشجویان ایران ایسنا با بیان این که این تاخیر جزیی است افزود تمام قراردادهای عمری سال اینده با وزیر حج سابق عربستان منعقد و جداول پروازی نیز مشخص شده بود که با وجود تغییرات اخیر باید تمام قراردادها تایر و زن ر نلی و مردال دوز لیترانر و کندگر این م ع به مین م ولایبنتی کلی وی ویر و ی ام روی وی و پی وی سپی وی ویه هی و طی وی وی ویدی ویدیت شی وی وی وی وی وی وی کند است متن

MahmoudAshraf97 · 2024-11-23T12:26:03Z

In CT2 conversion, remove the quantization and try again

nullscc · 2024-11-26T06:27:48Z

I have same issue. Remained the same after removing the quantization.

nullscc · 2024-11-26T07:29:35Z

After comparing the encoder_output, I found that there is a lot difference between whisper and faster-whisper inference, totally different I mean. I don't konw why. Cound you help with this? @MahmoudAshraf97 @hforghani

hforghani · 2024-11-26T07:31:09Z

I have removed the quantization but still too much hallucination.
But I accidentally tried BatchedInferencePipeline and it helped removing hallucination but still greater error than speechbrain:

Model	Platform	quantization	Time (s)	WER	CER	Max Memory (MB)
My fine-tuned model	Speechbrain	-	890	0.1495	0.0309	8182
My fine-tuned model	Faster-whisper	-	751	0.2037	0.0704	~9000
My fine-tuned model	Faster-whisper	fp16	452	0.2034	0.0700	~5300

@MahmoudAshraf97 @nullscc

Purfview · 2024-11-26T07:39:00Z

Compare to Openai-whisper

hforghani · 2024-11-26T08:01:54Z

The third row of the table in this comment is related to openai-whisper. It is the pretrained large-v3 not my fine-tuned one. @Purfview

Purfview · 2024-11-26T08:14:13Z

In that table openai-whisper is similar to faster-whisper.
Better ask Speechbrain what settings trigger different result.

hforghani · 2024-11-27T06:43:15Z

All SpeechBrain settings:

# Related to Whsper
task = "transcribe"
initial_prompt = None
logprob_threshold = -1.0
no_speech_threshold = 0.6
condition_on_previous_text = False
chunk_size = 30

# Related to decoder: S2SWhisperGreedySearcher
temperature=0.0
use_kv_cache=True
suppress_blank=True
suppress_tokens="-1"
sample_len=None
prefix=None
prompt=None
beam_size = 8
min_decode_ratio = 0.0
max_decode_ratio = 1.0
using_eos_threshold=True
eos_threshold=1.5
length_normalization=True
using_max_attn_shift=False
max_attn_shift=60
minus_inf=-1e20

My inference using speechbrain:

res = asr_model.transcribe_file(voice_file, task="transcribe",  use_torchaudio_streaming=True)

Code of the method transcribe_file:
https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/inference/ASR.py , line 774

All faster-whisper settings:

language = "fa"
task = "transcribe"
beam_size = 8
best_of: int = 8
patience = 1
length_penalty = 1
repetition_penalty = 1
no_repeat_ngram_size = 0
temperature = [ 0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
compression_ratio_threshold = 2.4
log_prob_threshold = -1.0
no_speech_threshold = 0.6
initial_prompt = None
prefix = None
suppress_blank = True
suppress_tokens = [-1]
without_timestamps = True
word_timestamps = False
prepend_punctuations = "\"'“¿([{-"
append_punctuations = "\"'.。,，!！?？:：”)]}、"
multilingual = False
vad_filter = True
vad_parameters = None
max_new_tokens = None
chunk_length = None
clip_timestamps = None
batch_size: int = 8
hotwords = None
language_detection_threshold = 0.5
language_detection_segments = 1

@Purfview @MahmoudAshraf97

Update:
I set language = None and temperature = 0 for faster-whisper and observed the same results.

Purfview · 2024-11-27T08:20:15Z

Better ask Speechbrain, they will know more about it.

Here the aim is more or less to replicate the openAI not some Speechbrain's function.

hforghani · 2024-11-27T09:39:26Z

After comparing the encoder_output, I found that there is a lot difference between whisper and faster-whisper inference, totally different I mean. I don't konw why. Cound you help with this? @MahmoudAshraf97 @hforghani

@nullscc
Understood how to fix the problem? I just have used BatchedInferencePipeline and it got solved:

from faster_whisper import WhisperModel, BatchedInferencePipeline

model = WhisperModel("tmp_whisper_ft_ctranslate2", device='cuda')
batched_model = BatchedInferencePipeline(model=model)
segments, info = batched_model.transcribe(voice_file, language="your_lang")

Some config may differ in BatchedInferencePipeline.transcribe from WhisperModel.transcribe. I don't know! Just it was sufficient for me to use BatchedInferencePipeline.

nullscc · 2024-11-30T02:27:06Z

@hforghani No, I tried both, WhisperModel and BatchedInferencePipeline, still get worse result.

@MahmoudAshraf97 In my situation, I finetuned the model, and always decode using openai-whisper. Before finetuned, I got nearly same result when decoding using pretrained openai-whisper. But after finetuned, I got worse result when decoding using CTranslate2 converted model compared to inference using openai-whisper code. Have no idea till now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greater error when converted via ctranslate2 #1168

Greater error when converted via ctranslate2 #1168

hforghani commented Nov 23, 2024 •

edited

Loading

hforghani commented Nov 23, 2024 •

edited

Loading

MahmoudAshraf97 commented Nov 23, 2024

nullscc commented Nov 26, 2024

nullscc commented Nov 26, 2024 •

edited

Loading

hforghani commented Nov 26, 2024 •

edited

Loading

Purfview commented Nov 26, 2024

hforghani commented Nov 26, 2024

Purfview commented Nov 26, 2024

hforghani commented Nov 27, 2024 •

edited

Loading

Purfview commented Nov 27, 2024

hforghani commented Nov 27, 2024 •

edited

Loading

nullscc commented Nov 30, 2024

Greater error when converted via ctranslate2 #1168

Greater error when converted via ctranslate2 #1168

Comments

hforghani commented Nov 23, 2024 • edited Loading

hforghani commented Nov 23, 2024 • edited Loading

MahmoudAshraf97 commented Nov 23, 2024

nullscc commented Nov 26, 2024

nullscc commented Nov 26, 2024 • edited Loading

hforghani commented Nov 26, 2024 • edited Loading

Purfview commented Nov 26, 2024

hforghani commented Nov 26, 2024

Purfview commented Nov 26, 2024

hforghani commented Nov 27, 2024 • edited Loading

Purfview commented Nov 27, 2024

hforghani commented Nov 27, 2024 • edited Loading

nullscc commented Nov 30, 2024

hforghani commented Nov 23, 2024 •

edited

Loading

hforghani commented Nov 23, 2024 •

edited

Loading

nullscc commented Nov 26, 2024 •

edited

Loading

hforghani commented Nov 26, 2024 •

edited

Loading

hforghani commented Nov 27, 2024 •

edited

Loading

hforghani commented Nov 27, 2024 •

edited

Loading