Can I serve speechbrain trained model whisper with faster whisper? #1139

cod3r0k · 2024-11-14T03:48:51Z

Can I serve speechbrain trained model whisper with faster whisper?

MahmoudAshraf97 · 2024-11-14T11:02:29Z

You have to convert it to CT2 first, there are several converters available, you can check CT2 documentation for more information

cod3r0k · 2024-11-14T12:20:06Z

Great, can you help me more? What is CT2? @MahmoudAshraf97

MahmoudAshraf97 · 2024-11-14T14:59:43Z

The backend of Faster Whisper
https://github.com/OpenNMT/CTranslate2/

cod3r0k · 2024-11-15T04:17:55Z

Great. You mean that I do as below:

like transformers

#First, load the SpeechBrain Whisper model and extract its weights.
from transformers import WhisperProcessor, WhisperForConditionalGeneration
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")

# Save the model in Hugging Face format
model.save_pretrained("whisper_huggingface")
processor.save_pretrained("whisper_huggingface")

we do in SB

from speechbrain.pretrained import WhisperASR
whisper = WhisperASR.from_hparams(source="speechbrain/whisper-large", savedir="tmp_whisper")
# Save model weights
model = whisper.modules.model
torch.save(model.state_dict(), "speechbrain_whisper_weights.pth")

from transformers import WhisperForConditionalGeneration
# Load the Hugging Face Whisper model
hf_whisper = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")

#map weight
import torch

# Load SpeechBrain weights
speechbrain_weights = torch.load("speechbrain_whisper_weights.pth")

# Load Hugging Face model weights
hf_model_state_dict = hf_whisper.state_dict()

# Map weights from SpeechBrain to Hugging Face
mapped_weights = {}
for name, param in hf_model_state_dict.items():
    # Replace this mapping logic with the exact alignment of layers
    if name in speechbrain_weights:
        mapped_weights[name] = speechbrain_weights[name]
    else:
        mapped_weights[name] = param  # Use original HF weights if no match

# Update Hugging Face model with the mapped weights
hf_whisper.load_state_dict(mapped_weights)

# Save the updated model
hf_whisper.save_pretrained("hf_whisper_converted")


#verify
from transformers import WhisperProcessor
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
audio_path = "path_to_audio.wav"
inputs = processor(audio_path, return_tensors="pt", sampling_rate=16000)
generated_ids = hf_whisper.generate(**inputs)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(f"Transcription: {transcription}")

Then

ctranslate2-converter --model hf_whisper_converted --output_dir whisper_ctranslate2 --quantization

Do I do it correctly?

MahmoudAshraf97 · 2024-11-15T12:04:43Z

Exactly, if the model you have in not in huggingface format, you need to convert it first to that format then to CT2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I serve speechbrain trained model whisper with faster whisper? #1139

Can I serve speechbrain trained model whisper with faster whisper? #1139

cod3r0k commented Nov 14, 2024

MahmoudAshraf97 commented Nov 14, 2024

cod3r0k commented Nov 14, 2024

MahmoudAshraf97 commented Nov 14, 2024

cod3r0k commented Nov 15, 2024 •

edited

Loading

MahmoudAshraf97 commented Nov 15, 2024

Can I serve speechbrain trained model whisper with faster whisper? #1139

Can I serve speechbrain trained model whisper with faster whisper? #1139

Comments

cod3r0k commented Nov 14, 2024

MahmoudAshraf97 commented Nov 14, 2024

cod3r0k commented Nov 14, 2024

MahmoudAshraf97 commented Nov 14, 2024

cod3r0k commented Nov 15, 2024 • edited Loading

MahmoudAshraf97 commented Nov 15, 2024

cod3r0k commented Nov 15, 2024 •

edited

Loading