WhisperKit Android (Beta)

WhisperKit Android brings Foundation Models On Device for Automatic Speech Recognition. It extends the performance and feature set of WhisperKit from Apple platforms to Android and Linux. The current feature set is a subset of the iOS counterpart, but we are continuing to invest in Android and now welcome contributions from the community.

[Example App (Coming Soon)] [Blog Post] [Python Tools Repo]

Installation

(Click to expand)

The following setup was tested on macOS 15.1.

Ensure you have the required build tools using:

make setup

Download Whisper models (<1.5GB) and auxiliary files

make download-models

Build development environment in Docker with all development tools (~12GB):

make env

The first time running make env command will take several minutes.

After the Docker image builds, the next time running make env will execute inside the Docker container right away.

You can use the following to rebuild the Docker image, if needed:

make rebuild-env

Getting Started

(Click to expand)

WhisperKit Android is a Whisper pipeline built on top of Tensorflow Lite (LiteRT) with a provided CLI interface via whisperkit-cli. The library is built with a C API for Android and Linux. Please note that as the library is currently in Beta, the C API is not yet stable.

Execute into the Docker build environment:

make env

Inside the Docker environment, build the whisperkit-cli CLI using (for Android and Linux):

make build [linux | qnn | gpu]

The QNN option builds WhisperKit with Qualcomm AI NPU support and the QNN TFLite delegate. The 'gpu' option is the generic GPU backend for all Android devices from TFLite GPU delegate. Linux builds are currently CPU-only.

Back on the host machine (outside Docker shell), push dependencies to the Android device:

make adb-push

You can reuse this target to push the whisperkit-cli if you rebuild it. Note that this is not necessary for Linux build.

Clean:

make clean [all]

With all option, it will conduct deep clean including open source components.

CLI Run and Test

(Click to expand)

Run test on with a sample audio. For Android:

make build

For Linux:

make build linux

Manually run whisperkit-cli:

Usage:

whisperkit-cli transcribe --model-path /path/to/my/whisper_model --audio-path /path/to/my/audio_file.m4a --report --report-path /path/to/dump/report.json

For all options, run whisperkit-cli --help

For Android, log in via adb shell:

adb shell
cd /sdcard/argmax/tflite
export PATH=/data/local/tmp/bin:$PATH
export LD_LIBRARY_PATH=/data/local/tmp/lib
whisperkit-cli transcribe --model-path  /path/to/openai_whisper-base --audio-path /path/to/inputs/jfk_441khz.m4a

Sample execution output:

root@cf40510e9b93:/src/AXIE# ./build/linux/whisperkit-cli transcribe --model-path /src/AXIE/models/openai_whisper-small --audio-path /src/AXIE/test/jfk_441khz.m4a 
SoC: 	generic CPU (x86, arm64, etc) 
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
postproc vocab size: 51864
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '(null)':
  Metadata:
    major_brand     : M4A 
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2024-08-07T16:38:45.000000Z
    iTunSMPB        :  00000000 00000840 000000D4 00000000000766EC 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  Duration: 00:00:11.05, start: 0.047891, bitrate: 73 kb/s
  Stream #0:0[0x1](eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 31 kb/s (default)
      Metadata:
        creation_time   : 2024-08-07T16:38:45.000000Z
        vendor_id       : [0][0][0][0]
Stream: freq - 44100, channels - 1, format - 32784, target_buf size - 1440000
[aac @ 0x55555a5b8c00] Could not update timestamps for skipped samples.
Transcription:   And so, my fellow Americans, ask not what your country can do for you.   Ask what you can do for your country.

Contributing

WhisperKit Android is currently in the v0.1 Beta stage. We are actively developing the project and welcome contributions from the community.

License

We release WhisperKit Android under MIT License.
SDL3 open-source (audio resampling) is released under zlib license
FFmpeg open-source (audio decompressing) is released under LGPL
OpenAI Whisper model open-source checkpoints were released under the MIT License.
Qualcomm AI Hub .tflite models and QNN libraries for NPU deployment are released under the Qualcomm AI Model & Software License.

Citation

If you use WhisperKit for something cool or just find it useful, please drop us a note at [email protected]!

If you are looking for managed enterprise deployment with Argmax, please drop us a note at [email protected].

If you use WhisperKit for academic work, here is the BibTeX:

@misc{whisperkit-argmax,
   title = {WhisperKit},
   author = {Argmax, Inc.},
   year = {2024},
   URL = {https://github.com/argmaxinc/WhisperKit}
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github		.github
WhisperKit		WhisperKit
cli		cli
scripts		scripts
test		test
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperKit Android (Beta)

Table of Contents

Installation

Getting Started

CLI Run and Test

Contributing

License

Citation

About

Releases 1

Packages

Contributors 6

Languages

License

argmaxinc/WhisperKitAndroid

Folders and files

Latest commit

History

Repository files navigation

WhisperKit Android (Beta)

Table of Contents

Installation

Getting Started

CLI Run and Test

Contributing

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages