Medical Assistant is a desktop application designed to transcribe and refine spoken medical notes. It leverages advanced AI APIs (OpenAI, Perplexity, Grok, and Ollama) and offers efficient audio-to-text conversion and note generation with context-aware capabilities.
- Workflow-Based Interface: Modern task-oriented design with 4 main workflow tabs (Record, Process, Generate, Recordings) plus 6 text editor tabs
- AI-Powered Chat Interface: ChatGPT-style interface with context-aware suggestions for interacting with your medical notes
- RAG Document Search: New RAG tab enables searching your document database via N8N webhook integration with markdown rendering
- Advanced Recording System: Record medical conversations with visual feedback, timer display, and pause/resume capabilities
- Real-Time Analysis: Optional periodic analysis during recording generates differential diagnoses every 2 minutes
- Queue System: Background processing queue with "Quick Continue Mode" for efficient multi-patient recording sessions
- Dedicated Recordings Manager: New Recordings tab with search, filter, and document status indicators (β, β, π, β)
- Context-Aware SOAP Notes: Side panel for adding previous medical information that automatically integrates into SOAP note generation
- Smart Templates: Pre-built and custom context templates for common scenarios (Follow-up, New Patient, Telehealth, etc.)
- Multi-Format Document Generation: Create SOAP notes, referral letters, and custom medical documents
- Smart Context Preservation: Context information is preserved during SOAP recordings and only cleared on new sessions or manual clearing
- Medication Analysis Agent: Comprehensive medication analysis including extraction, interaction checking, dosing validation, and prescription generation
- Clinical Workflow Coordination: Step-by-step guidance for patient intake, diagnostic workups, treatment protocols, and follow-up care
- Bidirectional Translation Assistant: Real-time medical translation with STT/TTS support for multilingual patient consultations
- Multiple STT Providers: Deepgram, ElevenLabs, Groq, or local Whisper for speech-to-text conversion
- Multiple AI Providers: OpenAI, Perplexity, Grok, or local Ollama models for text processing
- Customizable Prompts: Edit and import/export prompts and models for text refinement and note generation
- Intelligent Text Processing: Refine, improve clarity, and generate medical documentation with AI assistance
- Text-to-Speech (TTS): ElevenLabs integration with voice selection and multiple language support
- Database Storage: Automatic saving and retrieval of recordings, transcripts, and generated documents
- Export Functionality: Export recordings and documents in various formats
- File Logging System: Track application activity with a built-in logging system that maintains the last 1000 entries
- Cross-Platform Support: Available for Windows, macOS, and Linux with platform-specific optimizations
- Modern UI/UX: Built with Tkinter and ttkbootstrap featuring animations, visual indicators, and responsive design
- Python 3.10 or higher (required for Deepgram SDK compatibility)
- FFmpeg (for audio processing)
-
Clone or Download the Repository
git clone <repository-url>
-
Install Dependencies
Run the following command in the project directory:pip install -r requirements.txt
-
Configuration
- Create a
.env
file in the project root, or use the "API Keys" dialog in the application. - Add your API keys and configuration settings:
- LLM Services:
OPENAI_API_KEY
,PERPLEXITY_API_KEY
,GROK_API_KEY
- Speech-to-Text Services:
DEEPGRAM_API_KEY
,ELEVENLABS_API_KEY
,GROQ_API_KEY
- Local Models:
OLLAMA_API_URL
(defaults to "http://localhost:11434") - Language Settings:
RECOGNITION_LANGUAGE
(defaults to "en-US") - RAG Integration:
N8N_URL
andN8N_AUTHORIZATION_SECRET
for document search
- LLM Services:
- Minimum Requirements: You need at least one LLM provider and one STT provider to use the application.
- Create a
-
Ollama Setup (Optional)
To use local AI models:- Install Ollama from ollama.ai
- Pull models using
ollama pull <model_name>
(e.g.,ollama pull llama3
) - The application will automatically detect available models
-
FFmpeg Installation
FFmpeg is required for audio processing. Download FFmpeg from ffmpeg.org and follow the instructions for Windows.
For a step-by-step guide, watch this YouTube tutorial: How to Install FFmpeg on Windows.
The application can be packaged as a standalone executable for Windows, macOS, and Linux using PyInstaller.
- Ensure all dependencies are installed:
pip install -r requirements.txt
- For Windows: Have Python and pip in your PATH
- For macOS: May need to install Xcode command line tools
- For Linux: Ensure python3-tk is installed system-wide
Windows:
build_windows.bat
The executable will be in dist/MedicalAssistant.exe
macOS:
./build_macos.sh
The app bundle will be in dist/MedicalAssistant.app
Linux:
# First, ensure FFmpeg is installed:
sudo apt-get install ffmpeg # For Ubuntu/Debian
# or
sudo dnf install ffmpeg # For Fedora
# or
sudo pacman -S ffmpeg # For Arch
# Then build:
./build_linux.sh
The executable will be in dist/MedicalAssistant
Important for Linux: Run the application using the launcher script:
./dist/linux_launcher.sh
This ensures system FFmpeg libraries are used correctly.
- The executable includes all Python dependencies
- Users still need to have FFmpeg installed separately
- API keys can be configured via the application's settings dialog
- First run may be slower as antivirus software scans the executable
Create desktop shortcuts for easy access:
Windows:
create_desktop_shortcut.bat
Linux:
./install_desktop_entry.sh
macOS: Desktop shortcuts are automatically created during the build process.
-
Launching the Application
Execute the following command:python main.py
-
Setting Up AI Provider
- Select your preferred AI provider from the dropdown (OpenAI, Perplexity, Grok, or Ollama)
- For cloud services, ensure you've entered valid API keys
- For Ollama, click "Test Ollama Connection" in settings to verify your setup
-
Main Workflow Tabs
- Record Tab: Start/stop recordings with visual feedback, timer display, and pause/resume controls
- Enable "Advanced Analysis" checkbox for real-time differential diagnosis every 2 minutes during recording
- Clear button to manually clear analysis results
- Analysis results automatically clear when starting a new recording
- Process Tab: Refine and improve transcribed text with AI assistance
- Generate Tab: Create SOAP notes, referrals, letters, and perform medication analysis
- Recordings Tab: View, search, and manage all saved recordings with document status indicators
- Record Tab: Start/stop recordings with visual feedback, timer display, and pause/resume controls
-
Using the Chat Interface
- Located at the bottom of the main content area
- Press
Ctrl+/
(orCmd+/
on Mac) to quickly focus the chat input - Context-aware suggestions based on your current tab and content
- Interact with any text in the editor tabs
- Get intelligent suggestions for next steps
-
Working with Context
- Click the "Context" button to open the collapsible side panel
- Add previous medical information that will be automatically included in SOAP notes
- Use pre-built templates or create custom ones
- Context is preserved during SOAP recordings but cleared on new sessions
- Use the "Clear Context" button to manually clear information
-
Queue System and Quick Continue Mode
- Enable "Quick Continue Mode" to queue recordings while starting new ones
- Monitor queue status in the status bar
- Perfect for busy clinics with back-to-back patients
- Background processing ensures smooth workflow
-
Managing Recordings
- Access the Recordings tab to view all saved recordings
- Document status indicators show completion state:
- β (green) = Document generated
- β (gray) = Not generated
- π (blue) = In progress
- β (red) = Error
- Search and filter recordings by date or content
- Load recordings to continue working on them
- Export recordings and documents
-
Using Medication Analysis
- Click the medication analysis button in the Generate tab
- Choose your content source (transcript, SOAP note, or context information)
- Select analysis type:
- Extract medications from text
- Check drug interactions
- Validate dosing
- Suggest alternatives
- Generate prescriptions
- Comprehensive analysis
- View detailed results with warnings and recommendations
-
Using Bidirectional Translation Assistant
- Access via Tools β Translation Assistant menu
- Select patient and doctor languages from dropdown menus
- Features include:
- Real-time speech-to-text for patient input
- Automatic translation between languages
- Text-to-speech playback for patient responses
- Customizable canned responses for common medical phrases
- Export conversation transcripts
- Supports multiple languages including Chinese (Simplified/Traditional), Spanish, French, and more
-
Using the RAG Document Search
- Navigate to the RAG tab (next to Chat tab)
- Type your query in the AI Assistant chat box at the bottom
- The system will search your document database via N8N webhook
- Features include:
- Markdown-formatted responses with headers, bullets, and code blocks
- Copy button for each response to save important information
- Clear RAG History button to start fresh searches
- Session persistence for continuous conversations
- Configure N8N webhook URL and authorization in your .env file
-
Editing Prompts and Models
Use the "Prompt Settings" menu to modify and update prompts and models for:
- Refine text processing
- Improve text clarity
- SOAP note generation
- Referral letter creation
- Advanced Analysis (differential diagnosis during recording)
Each provider can have different model selections and temperature settings.
- Viewing Application Logs
- Access application logs through the "View Logs" option in the Help menu
- Choose between opening the logs directory or viewing logs directly in the application
- Logs automatically rotate to keep only the last 1000 entries, preventing excessive disk usage
-
API Keys: If you need to update API keys after startup, use the "API Keys" option in the settings menu.
-
Context Panel Issues:
- Context panel is accessed via the "Context" button, not a tab
- Context text is automatically preserved during SOAP recordings
- Use "New Session" or the "Clear Context" button to clear previous medical information
- Context is included as "Previous medical information" in SOAP note generation
-
Chat Interface Issues:
- If chat suggestions don't appear, ensure you have content in the active tab
- Use keyboard shortcut
Ctrl+/
(Cmd+/
on Mac) to quickly access chat - Chat context is based on the currently active tab
-
Queue System Issues:
- Monitor the status bar for queue progress
- If recordings are stuck in queue, check the logs for errors
- Disable "Quick Continue Mode" if you prefer sequential processing
-
Ollama Connection Issues: If you experience timeouts with Ollama models, try:
- Using a smaller model variant (e.g.,
mistral:small
instead ofmistral:7b
) - Ensuring your computer has adequate resources (CPU/RAM)
- Testing your connection with the "Test Ollama Connection" button
- Using a smaller model variant (e.g.,
-
Audio/Recording Issues:
- Ensure FFmpeg is properly installed and accessible
- Check microphone permissions in your operating system
- Verify your selected audio device in the application settings
-
Performance Issues:
- Close unused tabs and applications to free up system resources
- For large context text, consider breaking it into smaller sections
- Use local Ollama models if experiencing cloud API rate limits
- Application Logs: Check application logs through Help β View Logs for detailed error information
- Database Issues: Use the migration tools if you encounter database errors after updates
- Settings Reset: Delete the application's settings files to reset to defaults if needed
- Bidirectional Translation Assistant: Real-time medical translation system for multilingual consultations
- Support for 100+ languages with automatic detection
- Speech-to-text input for patient responses
- Text-to-speech output for doctor communications
- Customizable canned responses for common medical phrases
- Fixed Chinese language parsing for Simplified/Traditional variants
- Enhanced TTS Integration:
- ElevenLabs voice selection with dropdown interface
- Support for ElevenLabs Turbo v2.5 model for lower latency
- Configurable speech rate and voice settings
- Batch Processing: Process multiple recordings or audio files efficiently
- Dual source support (database recordings or computer files)
- Real-time progress tracking with ETA
- Continue on error capability
- Clinical Workflow Agent: Step-by-step guidance for medical processes
- Periodic Analysis: Real-time differential diagnosis during recordings (every 2 minutes)
- Medication Analysis Agent: New AI-powered medication agent with comprehensive analysis capabilities
- Extract medications from clinical text
- Check drug-drug interactions with severity levels
- Validate dosing appropriateness
- Suggest medication alternatives
- Generate prescriptions
- Comprehensive medication analysis with safety warnings
- Enhanced Generate Tab: Added medication analysis button alongside existing document generation
- Context Support: Medication analysis can now use context information as input source
- Agent Framework: Extensible agent system for specialized medical AI tasks
- New Recordings Tab: Dedicated tab for managing all recordings with visual status indicators
- AI Chat Interface: ChatGPT-style interface for intelligent interaction with medical notes
- Workflow-Based UI: Completely redesigned interface organized by tasks (Record, Process, Generate)
- Queue System: Background processing with "Quick Continue Mode" for efficient multi-patient workflows
- Context Panel Redesign: Context moved from tab to collapsible side panel with template support
- Visual Enhancements: Recording animations, timer display, and improved status indicators
- Document Status Tracking: Visual indicators (β, β, π, β) show completion state of each document type
- Context Feature: Added previous medical information support for SOAP note generation
- Smart Context Preservation: Context preserved during SOAP recordings
- Code Optimization: Removed duplicate code and improved performance
- Modern UI/UX: Task-oriented workflow with visual feedback and animations
- Enhanced Recording: Pause/resume capabilities with timer display
- Smart Templates: Pre-built and custom context templates for common scenarios
- Export Functionality: Export recordings and documents in various formats
- Multi-Provider STT Support: Deepgram, ElevenLabs, Groq, and Whisper integration
- Performance Optimizations: Reduced startup time and improved memory usage
- Operating System: Windows 10+, macOS 10.14+, or Linux (Ubuntu 18.04+)
- Python: 3.8+ (for running from source)
- Memory: 4GB RAM minimum, 8GB recommended
- Storage: 500MB free space for application and dependencies
- Internet: Required for cloud AI services (optional for local Ollama models)
- Audio: Microphone for speech-to-text functionality
- User Guide - Comprehensive user documentation
- Keyboard Shortcuts - Quick reference for keyboard shortcuts
- Security Features - Security implementation details
- Database Schema - Database structure and improvements
- Testing Guide - Comprehensive testing documentation (80%+ coverage)
- Testing Quick Start - Quick reference for running tests
- UI Testing Setup - Guide for UI testing with PyQt5
- CLAUDE.md - Development guide for AI-assisted development
The project includes a comprehensive test suite with:
- 352 total tests (327 unit tests + 25 UI tests)
- 80.68% code coverage on core modules
- Unit tests for all major components
- Integration tests for the recording pipeline
- UI tests demonstrating PyQt5 testing patterns
- Pre-commit hooks for code quality
- CI/CD pipeline for automated testing
To run tests:
# Install test dependencies
pip install -r requirements-dev.txt
# Run all tests
python -m pytest
# Run with coverage
python run_tests.py --cov
# Run UI tests
python tests/run_ui_tests.py
Contributions to the Medical Dictation Assistant are welcome.
- Fork the repository.
- Create a feature branch.
- Submit a Pull Request with your enhancements.
- Ensure all tests pass and maintain 80%+ coverage
Distributed under the MIT License.