VoiceAid is an Audio-Interactive Document Assistant designed to assist the visually impaired in navigating documents. Powered by Google's Gemini, VoiceAid allows users to interact with documents using voice commands and provides audio responses.
Please refrain from sharing personal information with any AI systems, as it may be used to train them. Protect your privacy by avoiding the disclosure of sensitive data.
- Supports various file formats including PDF, DOCX, PNG, JPG, and JPEG.
- Utilizes Google's Gemini for generating AI-powered responses.
- Provides live transcription of user commands.
- Allows users to upload documents and ask questions via voice or text input.
- Offers audio feedback for responses.
- uses Chromium's built-in TTS: Microsoft Liam Online (Natural) - English (Canada)
- Dark / Light themes
Devices | Voice Input | Voice Output | Text Output | File Upload |
---|---|---|---|---|
iOS | ✅ | ❌ | ✅ | ✅ |
Android | ✅ | ✅ | ||
Windows 10/11 | ✅ | ✅ | ✅ | ✅ |
Mac | ✅ | ✅ | ✅ | ✅ |
⚠️ : May or May not work properly, unstable, bugs- ❌: Fully does not work
- ✅: Fully works
To use VoiceAid, you'll need an API key from Google's AI Studio. Follow these steps to get started:
- Obtain an API key from Google's AI Studio.
- Clone this repository to your local machine.
- Open
index.html
in your preferred web browser. - Enable microphone access.
- Enter your API key in the provided input field.
- Start interacting with VoiceAid by asking questions or uploading documents.
VoiceAid supports the following voice commands:
- "Stop talking" or "Stop": Stops audio playback.
- "Delete file", "Remove file", "Remove the file", or "Delete the file": Clears the uploaded document.
Contributions are welcome! If you'd like to contribute to VoiceAid, feel free to fork this repository and submit a pull request with your changes.