Medical Assistant

Medical Assistant is a desktop application designed to transcribe and refine spoken medical notes. It leverages advanced AI APIs (OpenAI, Perplexity, Grok, and Ollama) and offers efficient audio-to-text conversion and note generation with context-aware capabilities.

Features

Core Features

Workflow-Based Interface: Modern task-oriented design with 4 main workflow tabs (Record, Process, Generate, Recordings) plus 6 text editor tabs
AI-Powered Chat Interface: ChatGPT-style interface with context-aware suggestions for interacting with your medical notes
RAG Document Search: New RAG tab enables searching your document database via N8N webhook integration with markdown rendering
Advanced Recording System: Record medical conversations with visual feedback, timer display, and pause/resume capabilities
Real-Time Analysis: Optional periodic analysis during recording generates differential diagnoses every 2 minutes
Queue System: Background processing queue with "Quick Continue Mode" for efficient multi-patient recording sessions
Dedicated Recordings Manager: New Recordings tab with search, filter, and document status indicators (✓, —, 🔄, ❌)

Medical Documentation

Context-Aware SOAP Notes: Side panel for adding previous medical information that automatically integrates into SOAP note generation
Smart Templates: Pre-built and custom context templates for common scenarios (Follow-up, New Patient, Telehealth, etc.)
Multi-Format Document Generation: Create SOAP notes, referral letters, and custom medical documents
Smart Context Preservation: Context information is preserved during SOAP recordings and only cleared on new sessions or manual clearing
Medication Analysis Agent: Comprehensive medication analysis including extraction, interaction checking, dosing validation, and prescription generation
Clinical Workflow Coordination: Step-by-step guidance for patient intake, diagnostic workups, treatment protocols, and follow-up care
Bidirectional Translation Assistant: Real-time medical translation with STT/TTS support for multilingual patient consultations

AI & Transcription

Multiple STT Providers: Deepgram, ElevenLabs, Groq, or local Whisper for speech-to-text conversion
Multiple AI Providers: OpenAI, Perplexity, Grok, or local Ollama models for text processing
Customizable Prompts: Edit and import/export prompts and models for text refinement and note generation
Intelligent Text Processing: Refine, improve clarity, and generate medical documentation with AI assistance
Text-to-Speech (TTS): ElevenLabs integration with voice selection and multiple language support

Technical Features

Database Storage: Automatic saving and retrieval of recordings, transcripts, and generated documents
Export Functionality: Export recordings and documents in various formats
File Logging System: Track application activity with a built-in logging system that maintains the last 1000 entries
Cross-Platform Support: Available for Windows, macOS, and Linux with platform-specific optimizations
Modern UI/UX: Built with Tkinter and ttkbootstrap featuring animations, visual indicators, and responsive design

Installation

Prerequisites

Python 3.10 or higher (required for Deepgram SDK compatibility)
FFmpeg (for audio processing)

Clone or Download the Repository
```
git clone <repository-url>
```
Install Dependencies
Run the following command in the project directory:
```
pip install -r requirements.txt
```
Configuration
- Create a .env file in the project root, or use the "API Keys" dialog in the application.
- Add your API keys and configuration settings:
  - LLM Services: OPENAI_API_KEY, PERPLEXITY_API_KEY, GROK_API_KEY
  - Speech-to-Text Services: DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, GROQ_API_KEY
  - Local Models: OLLAMA_API_URL (defaults to "http://localhost:11434")
  - Language Settings: RECOGNITION_LANGUAGE (defaults to "en-US")
  - RAG Integration: N8N_URL and N8N_AUTHORIZATION_SECRET for document search
- Minimum Requirements: You need at least one LLM provider and one STT provider to use the application.
Ollama Setup (Optional)
To use local AI models:
- Install Ollama from ollama.ai
- Pull models using ollama pull <model_name> (e.g., ollama pull llama3)
- The application will automatically detect available models
FFmpeg Installation
FFmpeg is required for audio processing. Download FFmpeg from ffmpeg.org and follow the instructions for Windows.
For a step-by-step guide, watch this YouTube tutorial: How to Install FFmpeg on Windows.

Building Standalone Executables

The application can be packaged as a standalone executable for Windows, macOS, and Linux using PyInstaller.

Prerequisites

Ensure all dependencies are installed: pip install -r requirements.txt
For Windows: Have Python and pip in your PATH
For macOS: May need to install Xcode command line tools
For Linux: Ensure python3-tk is installed system-wide

Building

Windows:

build_windows.bat

The executable will be in dist/MedicalAssistant.exe

macOS:

./build_macos.sh

The app bundle will be in dist/MedicalAssistant.app

Linux:

# First, ensure FFmpeg is installed:
sudo apt-get install ffmpeg  # For Ubuntu/Debian
# or
sudo dnf install ffmpeg      # For Fedora
# or
sudo pacman -S ffmpeg        # For Arch

# Then build:
./build_linux.sh

The executable will be in dist/MedicalAssistant

Important for Linux: Run the application using the launcher script:

./dist/linux_launcher.sh

This ensures system FFmpeg libraries are used correctly.

Distribution Notes

The executable includes all Python dependencies
Users still need to have FFmpeg installed separately
API keys can be configured via the application's settings dialog
First run may be slower as antivirus software scans the executable

Desktop Shortcuts (Optional)

Create desktop shortcuts for easy access:

Windows:

create_desktop_shortcut.bat

Linux:

./install_desktop_entry.sh

macOS: Desktop shortcuts are automatically created during the build process.

Usage

Launching the Application
Execute the following command:
```
python main.py
```
Setting Up AI Provider
- Select your preferred AI provider from the dropdown (OpenAI, Perplexity, Grok, or Ollama)
- For cloud services, ensure you've entered valid API keys
- For Ollama, click "Test Ollama Connection" in settings to verify your setup
Main Workflow Tabs
- Record Tab: Start/stop recordings with visual feedback, timer display, and pause/resume controls
  - Enable "Advanced Analysis" checkbox for real-time differential diagnosis every 2 minutes during recording
  - Clear button to manually clear analysis results
  - Analysis results automatically clear when starting a new recording
- Process Tab: Refine and improve transcribed text with AI assistance
- Generate Tab: Create SOAP notes, referrals, letters, and perform medication analysis
- Recordings Tab: View, search, and manage all saved recordings with document status indicators
Using the Chat Interface
- Located at the bottom of the main content area
- Press Ctrl+/ (or Cmd+/ on Mac) to quickly focus the chat input
- Context-aware suggestions based on your current tab and content
- Interact with any text in the editor tabs
- Get intelligent suggestions for next steps
Working with Context
- Click the "Context" button to open the collapsible side panel
- Add previous medical information that will be automatically included in SOAP notes
- Use pre-built templates or create custom ones
- Context is preserved during SOAP recordings but cleared on new sessions
- Use the "Clear Context" button to manually clear information
Queue System and Quick Continue Mode
- Enable "Quick Continue Mode" to queue recordings while starting new ones
- Monitor queue status in the status bar
- Perfect for busy clinics with back-to-back patients
- Background processing ensures smooth workflow
Managing Recordings
- Access the Recordings tab to view all saved recordings
- Document status indicators show completion state:
  - ✓ (green) = Document generated
  - — (gray) = Not generated
  - 🔄 (blue) = In progress
  - ❌ (red) = Error
- Search and filter recordings by date or content
- Load recordings to continue working on them
- Export recordings and documents
Using Medication Analysis
- Click the medication analysis button in the Generate tab
- Choose your content source (transcript, SOAP note, or context information)
- Select analysis type:
  - Extract medications from text
  - Check drug interactions
  - Validate dosing
  - Suggest alternatives
  - Generate prescriptions
  - Comprehensive analysis
- View detailed results with warnings and recommendations
Using Bidirectional Translation Assistant
- Access via Tools → Translation Assistant menu
- Select patient and doctor languages from dropdown menus
- Features include:
  - Real-time speech-to-text for patient input
  - Automatic translation between languages
  - Text-to-speech playback for patient responses
  - Customizable canned responses for common medical phrases
  - Export conversation transcripts
- Supports multiple languages including Chinese (Simplified/Traditional), Spanish, French, and more
Using the RAG Document Search
- Navigate to the RAG tab (next to Chat tab)
- Type your query in the AI Assistant chat box at the bottom
- The system will search your document database via N8N webhook
- Features include:
  - Markdown-formatted responses with headers, bullets, and code blocks
  - Copy button for each response to save important information
  - Clear RAG History button to start fresh searches
  - Session persistence for continuous conversations
- Configure N8N webhook URL and authorization in your .env file
Editing Prompts and Models
Use the "Prompt Settings" menu to modify and update prompts and models for:

Refine text processing
Improve text clarity
SOAP note generation
Referral letter creation
Advanced Analysis (differential diagnosis during recording)

Each provider can have different model selections and temperature settings.

Viewing Application Logs
- Access application logs through the "View Logs" option in the Help menu
- Choose between opening the logs directory or viewing logs directly in the application
- Logs automatically rotate to keep only the last 1000 entries, preventing excessive disk usage

Troubleshooting

Common Issues

API Keys: If you need to update API keys after startup, use the "API Keys" option in the settings menu.
Context Panel Issues:
- Context panel is accessed via the "Context" button, not a tab
- Context text is automatically preserved during SOAP recordings
- Use "New Session" or the "Clear Context" button to clear previous medical information
- Context is included as "Previous medical information" in SOAP note generation
Chat Interface Issues:
- If chat suggestions don't appear, ensure you have content in the active tab
- Use keyboard shortcut Ctrl+/ (Cmd+/ on Mac) to quickly access chat
- Chat context is based on the currently active tab
Queue System Issues:
- Monitor the status bar for queue progress
- If recordings are stuck in queue, check the logs for errors
- Disable "Quick Continue Mode" if you prefer sequential processing
Ollama Connection Issues: If you experience timeouts with Ollama models, try:
- Using a smaller model variant (e.g., mistral:small instead of mistral:7b)
- Ensuring your computer has adequate resources (CPU/RAM)
- Testing your connection with the "Test Ollama Connection" button
Audio/Recording Issues:
- Ensure FFmpeg is properly installed and accessible
- Check microphone permissions in your operating system
- Verify your selected audio device in the application settings
Performance Issues:
- Close unused tabs and applications to free up system resources
- For large context text, consider breaking it into smaller sections
- Use local Ollama models if experiencing cloud API rate limits

Getting Help

Application Logs: Check application logs through Help → View Logs for detailed error information
Database Issues: Use the migration tools if you encounter database errors after updates
Settings Reset: Delete the application's settings files to reset to defaults if needed

Recent Updates

Version 2.2.0 (Latest)

Bidirectional Translation Assistant: Real-time medical translation system for multilingual consultations
- Support for 100+ languages with automatic detection
- Speech-to-text input for patient responses
- Text-to-speech output for doctor communications
- Customizable canned responses for common medical phrases
- Fixed Chinese language parsing for Simplified/Traditional variants
Enhanced TTS Integration:
- ElevenLabs voice selection with dropdown interface
- Support for ElevenLabs Turbo v2.5 model for lower latency
- Configurable speech rate and voice settings
Batch Processing: Process multiple recordings or audio files efficiently
- Dual source support (database recordings or computer files)
- Real-time progress tracking with ETA
- Continue on error capability
Clinical Workflow Agent: Step-by-step guidance for medical processes
Periodic Analysis: Real-time differential diagnosis during recordings (every 2 minutes)

Version 2.1.0

Medication Analysis Agent: New AI-powered medication agent with comprehensive analysis capabilities
- Extract medications from clinical text
- Check drug-drug interactions with severity levels
- Validate dosing appropriateness
- Suggest medication alternatives
- Generate prescriptions
- Comprehensive medication analysis with safety warnings
Enhanced Generate Tab: Added medication analysis button alongside existing document generation
Context Support: Medication analysis can now use context information as input source
Agent Framework: Extensible agent system for specialized medical AI tasks

Version 2.0.0

New Recordings Tab: Dedicated tab for managing all recordings with visual status indicators
AI Chat Interface: ChatGPT-style interface for intelligent interaction with medical notes
Workflow-Based UI: Completely redesigned interface organized by tasks (Record, Process, Generate)
Queue System: Background processing with "Quick Continue Mode" for efficient multi-patient workflows
Context Panel Redesign: Context moved from tab to collapsible side panel with template support
Visual Enhancements: Recording animations, timer display, and improved status indicators
Document Status Tracking: Visual indicators (✓, —, 🔄, ❌) show completion state of each document type

Version 1.0.27

Context Feature: Added previous medical information support for SOAP note generation
Smart Context Preservation: Context preserved during SOAP recordings
Code Optimization: Removed duplicate code and improved performance

Key Improvements

Modern UI/UX: Task-oriented workflow with visual feedback and animations
Enhanced Recording: Pause/resume capabilities with timer display
Smart Templates: Pre-built and custom context templates for common scenarios
Export Functionality: Export recordings and documents in various formats
Multi-Provider STT Support: Deepgram, ElevenLabs, Groq, and Whisper integration
Performance Optimizations: Reduced startup time and improved memory usage

System Requirements

Operating System: Windows 10+, macOS 10.14+, or Linux (Ubuntu 18.04+)
Python: 3.8+ (for running from source)
Memory: 4GB RAM minimum, 8GB recommended
Storage: 500MB free space for application and dependencies
Internet: Required for cloud AI services (optional for local Ollama models)
Audio: Microphone for speech-to-text functionality

Documentation

User Documentation

User Guide - Comprehensive user documentation
Keyboard Shortcuts - Quick reference for keyboard shortcuts
Security Features - Security implementation details
Database Schema - Database structure and improvements

Development Documentation

Testing Guide - Comprehensive testing documentation (80%+ coverage)
Testing Quick Start - Quick reference for running tests
UI Testing Setup - Guide for UI testing with PyQt5
CLAUDE.md - Development guide for AI-assisted development

Testing Infrastructure

The project includes a comprehensive test suite with:

352 total tests (327 unit tests + 25 UI tests)
80.68% code coverage on core modules
Unit tests for all major components
Integration tests for the recording pipeline
UI tests demonstrating PyQt5 testing patterns
Pre-commit hooks for code quality
CI/CD pipeline for automated testing

To run tests:

# Install test dependencies
pip install -r requirements-dev.txt

# Run all tests
python -m pytest

# Run with coverage
python run_tests.py --cov

# Run UI tests
python tests/run_ui_tests.py

Contribution

Contributions to the Medical Dictation Assistant are welcome.

Fork the repository.
Create a feature branch.
Submit a Pull Request with your enhancements.
Ensure all tests pass and maintain 80%+ coverage

License

Distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 343 Commits
.github/workflows		.github/workflows
config		config
docs		docs
examples		examples
hooks		hooks
scripts		scripts
src		src
tests		tests
.coveragerc		.coveragerc
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
INITIAL.md		INITIAL.md
LICENSE		LICENSE
MedicalAssistant.vbs		MedicalAssistant.vbs
README.md		README.md
SHORTCUTS.md		SHORTCUTS.md
STT_TESTS_ANALYSIS.md		STT_TESTS_ANALYSIS.md
TESTING_IMPLEMENTATION_COMPLETE.md		TESTING_IMPLEMENTATION_COMPLETE.md
TESTING_PLAN_SUMMARY.md		TESTING_PLAN_SUMMARY.md
create_icon.md		create_icon.md
create_multisize_icon.md		create_multisize_icon.md
env.example		env.example
icon.icns		icon.icns
icon.ico		icon.ico
icon128x128.ico		icon128x128.ico
icon16x16.ico		icon16x16.ico
icon256x256.ico		icon256x256.ico
icon32x32.ico		icon32x32.ico
icon48x48.ico		icon48x48.ico
main.py		main.py
manage_keys.py		manage_keys.py
medical-assistant.desktop		medical-assistant.desktop
medical_assistant.spec		medical_assistant.spec
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
update_imports.py		update_imports.py

License

cortexuvula/Medical-Assistant

Folders and files

Latest commit

History

Repository files navigation

Medical Assistant

Features

Core Features

Medical Documentation

AI & Transcription

Technical Features

Installation

Prerequisites

Building Standalone Executables

Prerequisites

Building

Distribution Notes

Desktop Shortcuts (Optional)

Usage

Troubleshooting

Common Issues

Getting Help

Recent Updates

Version 2.2.0 (Latest)

Version 2.1.0

Version 2.0.0

Version 1.0.27

Key Improvements

System Requirements

Documentation

User Documentation

Development Documentation

Testing Infrastructure

Contribution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 58

Uh oh!

Contributors 2

Uh oh!

Languages