In today’s fast-paced world, voice messages offer a quick and convenient way to communicate when typing isn’t practical. However, they often fall short in noisy or public environments and can be challenging for those with hearing impairments. WhatsWhisper transforms your WhatsApp voice messages into versatile, actionable content by leveraging OpenAI's Whisper for precise transcription and Alibaba's ZipEnhancer for superior audio quality. Beyond simple transcription, it empowers you to schedule tasks effortlessly with Google Calendar integration and intelligently parse complex instructions using Microsoft's Phi-3.5.Experience a smarter, more accessible way to communicate and stay organized, wherever you are.
📝 Read the detailed blog post about WhatsWhisper on Medium
- Voice message transcription using OpenAI's Whisper
- Acoustic Noise Suppression & Audio Quality Enhancement using the SOTA Speech Enhancement Model "ZipEnhancer" by Speech Lab, Alibaba Group, China
- Voice-powered task scheduling with Google Calendar integration
- Smart task parsing using Microsoft's Phi-3.5
The diagram above illustrates the flow of data in WhatsWhisper:
- User sends a voice message via WhatsApp
- Message is received by WhatsApp Web API via venom-bot
- Audio file is passed through ZipEnhancer for quality improvement (optional)
- Enhanced audio is processed by Whisper ASR for transcription
- If scheduling command detected, text is analyzed by Phi-3.5 for task extraction
- Extracted task details are used to create Google Calendar events
- Final response (transcription/confirmation) is sent back to the user
Check out WhatsWhisper in action: Watch Demo
- Python 3.8+
- Node.js 14+
- npm
- FFmpeg
- Google Calendar API credentials
- Groq API key (for OpenAI's Whisper)
- OpenRouter API key (for Phi-3.5)
-
Clone the repository:
git clone https://github.com/saadsohail05/WhatsWhisper.git cd WhatsWhisper
-
Set up your credentials:
- Place
credentials.json
(Google Calendar API) in the root directory - Create
.env
file with:GROQ_API_KEY=your_groq_api_key OPENROUTER_API_KEY=your_openrouter_api_key
- Place
-
Run the setup script:
python setup.py
-
Start the FastAPI server:
python server.py
-
In a new terminal, start the WhatsApp bot:
node whatsapp-bot.js
-
Scan the QR code that appears in the terminal with your WhatsApp to connect.
To reset the WhatsApp session, you have two options:
-
Using the reset flag (recommended):
node whatsapp-bot.js --reset
-
Manual reset:
rm -rf tokens/ node whatsapp-bot.js
After resetting, a new QR code will appear in the terminal. Scan it with WhatsApp to establish a new session.
The bot supports the following commands:
-
Simple Transcription
- Send
!transcribe
- Send a voice message
- Receive the transcribed text
- Send
-
Enhanced Audio Transcription
- Send
!transcribe -e
- Send a voice message
- Receive enhanced audio transcription
- Send
-
Audio Enhancement
- Send
!enhance
- Send a voice message
- Receive enhanced audio file
- Send
-
Task Scheduling
- Send
!schedule
- Send a voice message like "Schedule a team meeting tomorrow at 2 PM for one hour"
- Receive confirmation of scheduled task in Google Calendar
- Send
-
Help
- Send
!commands
or!help
to see available commands
- Send
- Verify all API keys are correctly set in
.env
- Ensure Google Calendar credentials are properly configured
- Check server logs for detailed error messages
- Verify required directories (uploads, tokens, models) exist
- If the bot is not responding, make sure both FastAPI server and WhatsApp bot are running
- Ensure you have scanned the QR code to connect the bot to WhatsApp
- venom-bot - For WhatsApp Web automation
- Groq - For the Whisper API integration
- OpenAI - For the Whisper speech recognition model
- ZipEnhancer - For audio enhancement
- Microsoft - For the Phi-3.5 model
- Google Calendar API - For scheduling integration
This project is licensed under the MIT License. See the LICENSE file for details.