Hi, I'm Marc Päpper and I wanted to vibe code like Karpathy ;D, so I looked around and found the cool work of Vlad. I extended it to run with a local whisper model, so I don't need to pay for OpenAI tokens. I hope you have fun with it!
Simply run cli.py
and start dictating text anywhere in your system:
- Hold down right control key (Ctrl_r)
- Speak your text
- Release the key
- Watch as your spoken words are transcribed and automatically typed!
Works in any application or window - your text editor, browser, chat apps, anywhere you can type!
NEW: LLM voice command mode:
- Hold down the scroll_lock key (I think it's normally not used anymore that's why I chose it)
- Speak what you want the LLM to do
- The LLM receives your transcribed text and a screenshot of your current view
- The LLM answer is typed into your keyboard (streamed)
Works everywhere on your system and the LLM always has the screen context
git clone https://github.com/mpaepper/vibevoice.git
cd vibevoice
pip install -r requirements.txt
python src/vibevoice/cli.py
- Python 3.12 or higher
- CUDA-capable GPU (recommended) -> in server.py you can enable cpu use
- CUDA 12.x
- cuBLAS
- cuDNN 9.x
- In case you get this error:
OSError: PortAudio library not found
runsudo apt install libportaudio2
- Ollama for AI command mode (with multimodal models for screenshot support)
- Install Ollama by following the instructions at ollama.com
- Pull a model that supports both text and images for best results:
ollama pull gemma3:27b # Great model which can run on RTX 3090 or similar
- Make sure Ollama is running in the background:
ollama serve
- Make sure that you have CUDA >= 12.4 and cuDNN >= 9.x
- I had some trouble at first with Ubuntu 24.04, so I did the following:
sudo apt update && sudo apt upgrade
sudo apt autoremove nvidia* --purge
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb && sudo apt update
sudo apt install cuda-toolkit-12-8
or alternatively:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cudnn9-cuda-12
- Then after rebooting, it worked well.
- Start the application:
python src/vibevoice/cli.py
- Hold down right control key (Ctrl_r) while speaking
- Release to transcribe
- Your text appears wherever your cursor is!
You can customize various aspects of VibeVoice with the following environment variables:
VOICEKEY
: Change the dictation activation key (default: "ctrl_r")export VOICEKEY="ctrl" # Use left control instead
VOICEKEY_CMD
: Set the key for AI command mode (default: "scroll_lock")export VOICEKEY_CMD="ctsl" # Use left control instead of Scroll Lock key
OLLAMA_MODEL
: Specify which Ollama model to use (default: "gemma3:27b")export OLLAMA_MODEL="gemma3:4b" # Use a smaller VLM in case you have less GPU RAM
INCLUDE_SCREENSHOT
: Enable or disable screenshots in AI command mode (default: "true")export INCLUDE_SCREENSHOT="false" # Disable screenshots (but they are local only anyways)
SCREENSHOT_MAX_WIDTH
: Set the maximum width for screenshots (default: "1024")export SCREENSHOT_MAX_WIDTH="800" # Smaller screenshots
To use the screenshot functionality:
sudo apt install gnome-screenshot
VibeVoice supports two modes:
- Hold down the dictation key (default: right Control)
- Speak your text
- Release to transcribe
- Your text appears wherever your cursor is!
- Hold down the command key (default: Scroll Lock)
- Ask a question or give a command
- Release the key
- The AI will analyze your request (and current screen if enabled) and type a response
- Original inspiration: whisper-keyboard by Vlad
- Faster Whisper for the optimized Whisper implementation
- Built by Marc Päpper