Support for other audio formats #1399
Replies: 3 comments 2 replies
-
OpenAI's Whisper accomplishes this by invoking ffmpeg from the command line. I believe we could do something similar in whisper.cpp |
Beta Was this translation helpful? Give feedback.
-
Thanks for the suggestion @bobqianic If anyone needs a script to do the conversion and store it as a temp wav file: audio_path = os.path.join(some_dir, "temp.wav")
# Run ffmpeg command to convert audio to WAV format
cmd = [
"ffmpeg",
"-nostdin",
"-threads", "0",
"-i", "your_file.flac",
"-acodec", "pcm_s16le",
"-ar", "16000",
"-ac", "1",
"-f", "wav",
audio_path
]
subprocess.run(cmd, stderr=subprocess.DEVNULL, check=True)
# Use the converted file for inference
command = f"./main -m models/ggml-base.bin -f {audio_path}"
result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, shell=True, text=True)
prediction = result.stdout.strip() |
Beta Was this translation helpful? Give feedback.
-
I'm finding @fingertrouble's suggest really useful in a totally different use case. If you are recording audio on an iOS device (mobile) you won't be able to change the Hz from the standard 48khz to 16khz, so you have to do converting on the fly. I'm not deep enough into the process to know if there are performance hits from doing that, but I guess there are going to be. I have a feeling it could cause latency. I wonder if there could be different trained models on different Hz? That would be really cool, for example if there was a whisper tiny en that dealt with 48khz then you can pipe the default audio into whisper on the device more easily. |
Beta Was this translation helpful? Give feedback.
-
I've actually gone back to standard whisper because I not only need to convert my podcast transcribe files to 16 bit WAV files for whisper-cpp, but also it uses a really weird sample rate (it has to be 16Khz)?
It's just another step, and blows through any amazing speed increase cos I have to faff around with re-exporting or converting the files. Also my podcast is 2+ hours so the WAC files can be massive!
Support for standard 16 bit 44 or 48khz WAV, M4A/AAC, Flac and MP3 formats would be really useful.
At the very least MP3, because that's a standard among podcasters.
Beta Was this translation helpful? Give feedback.
All reactions