Voice enabled AI Assistant with voice activity detection AI Voice Assistant Pipeline This project implements an end-to-end AI Voice Assistant Pipeline that converts voice queries into text, processes them using a Large Language Model (LLM), and converts the response back into speech. Features
Voice-to-Text conversion using VAD (Voice Activity Detection) and Whisper Text processing using Google's Gemini AI Text-to-Speech conversion with adjustable parameters Low latency design Output restriction to 2 sentences Tunable parameters for voice output (pitch, gender, speed)
Technologies Used
Python Transformer (Pipeline) Torch Numpy speech_recognition VAD (Voice Activity Detection) Whisper google.generativeai (for Gemini) edge-tts
Pipeline Steps
Voice-to-Text Conversion Text Input into LLM Text-to-Speech Conversion
Contributing Contributions are welcome! Please feel free to submit a Pull Request.