Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I serve speechbrain trained model whisper with faster whisper? #1139

Open
cod3r0k opened this issue Nov 14, 2024 · 5 comments
Open

Can I serve speechbrain trained model whisper with faster whisper? #1139

cod3r0k opened this issue Nov 14, 2024 · 5 comments

Comments

@cod3r0k
Copy link

cod3r0k commented Nov 14, 2024

Can I serve speechbrain trained model whisper with faster whisper?

@MahmoudAshraf97
Copy link
Collaborator

You have to convert it to CT2 first, there are several converters available, you can check CT2 documentation for more information

@cod3r0k
Copy link
Author

cod3r0k commented Nov 14, 2024

Great, can you help me more? What is CT2? @MahmoudAshraf97

@MahmoudAshraf97
Copy link
Collaborator

The backend of Faster Whisper
https://github.com/OpenNMT/CTranslate2/

@cod3r0k
Copy link
Author

cod3r0k commented Nov 15, 2024

Great. You mean that I do as below:

like transformers

#First, load the SpeechBrain Whisper model and extract its weights.
from transformers import WhisperProcessor, WhisperForConditionalGeneration
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")

# Save the model in Hugging Face format
model.save_pretrained("whisper_huggingface")
processor.save_pretrained("whisper_huggingface")

we do in SB

from speechbrain.pretrained import WhisperASR
whisper = WhisperASR.from_hparams(source="speechbrain/whisper-large", savedir="tmp_whisper")
# Save model weights
model = whisper.modules.model
torch.save(model.state_dict(), "speechbrain_whisper_weights.pth")

from transformers import WhisperForConditionalGeneration
# Load the Hugging Face Whisper model
hf_whisper = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")

#map weight
import torch

# Load SpeechBrain weights
speechbrain_weights = torch.load("speechbrain_whisper_weights.pth")

# Load Hugging Face model weights
hf_model_state_dict = hf_whisper.state_dict()

# Map weights from SpeechBrain to Hugging Face
mapped_weights = {}
for name, param in hf_model_state_dict.items():
    # Replace this mapping logic with the exact alignment of layers
    if name in speechbrain_weights:
        mapped_weights[name] = speechbrain_weights[name]
    else:
        mapped_weights[name] = param  # Use original HF weights if no match

# Update Hugging Face model with the mapped weights
hf_whisper.load_state_dict(mapped_weights)

# Save the updated model
hf_whisper.save_pretrained("hf_whisper_converted")


#verify
from transformers import WhisperProcessor
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
audio_path = "path_to_audio.wav"
inputs = processor(audio_path, return_tensors="pt", sampling_rate=16000)
generated_ids = hf_whisper.generate(**inputs)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(f"Transcription: {transcription}")

Then

ctranslate2-converter --model hf_whisper_converted --output_dir whisper_ctranslate2 --quantization

Do I do it correctly?

@MahmoudAshraf97
Copy link
Collaborator

Exactly, if the model you have in not in huggingface format, you need to convert it first to that format then to CT2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants