WhisperKit Android brings Foundation Models On Device for Automatic Speech Recognition. It extends the performance and feature set of WhisperKit from Apple platforms to Android and (soon) Linux.
[Example App (Coming with Beta)] [Blog Post] [Python Tools Repo]
(Click to expand)
The following setup was tested on macOS 15.1.
- Ensure you have the required build tools using:
make setup
- Download Whisper models (<1.5GB) and auxiliary files
make download-models
- Build development environment in Docker with all development tools (~12GB):
make env
The first time running make env
command will take several minutes.
After Docker image builds, the next time running make env will execute inside the Docker container right away.
You can use the following to rebuild the Docker image, if needed:
make rebuild-env
(Click to expand)
ArgmaX Inference Engine (axie
) orchestration for TFLite is provided as the axie_tflite
CLI.
- Execute into the Docker build environment:
make env
- Inside the Docker environment, build the
axie_tflite
CLI using:
make build
- On the host machine (outside Docker shell), push dependencies to the Android device:
make adb-push
You can reuse this target to push the axie_tflite
if you rebuild it.
If you want to include audio files, place them in the /path/to/WhisperKitAndroid/inputs
folder and they will be copied to /sdcard/argmax/tflite/inputs/
.
- Connect to the Android device using:
make adb-shell
- Run
axie_tflite
Usage: axie_tflite <audio input> <tiny | base | small>
WhisperKit Android is currently in the v0.1 Alpha stage. Contributions from the community will be encouraged after the project reaches the v0.1 Beta milestone.
- Temperature fallbacks for decoding guardrails
- Input audio file format coverage for wav, flac, mp4, m4a, mp3
- Output file format coverage for SRT, VTT, and OpenAI-compatible JSON
- WhisperKit Benchmarks performance and quality data publication
- Whisper Large v3 Turbo (v20240930) support
- Streaming real-time inference
- Model compression
- We release WhisperKit Android under MIT License.
- OpenAI Whisper model open-source checkpoints were released under the MIT License.
- Qualcomm AI Hub
.tflite
models and QNN libraries for NPU deployment are released under the Qualcomm AI Model & Software License.
If you use WhisperKit for something cool or just find it useful, please drop us a note at [email protected]!
If you are looking for managed enterprise deployment with Argmax, please drop us a note at [email protected].
If you use WhisperKit for academic work, here is the BibTeX:
@misc{whisperkit-argmax,
title = {WhisperKit},
author = {Argmax, Inc.},
year = {2024},
URL = {https://github.com/argmaxinc/WhisperKit}
}