Skip to content

RaghavSethi006/JARVIS-V3

Repository files navigation

     β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
     β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•
     β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆ   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β•šβ•β•β•β•β–ˆβ–ˆβ•‘
β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘
 β•šβ•β•β•β•β• β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•  β•šβ•β•β•β•  β•šβ•β•β•šβ•β•β•β•β•β•β•

Just A Rather Very Intelligent System

A local-first, multi-agent AI desktop assistant with persistent memory,
runtime self-synthesis, neural voice synthesis, biometric authentication, and a British butler attitude.


Python React License LLM TTS Memory Agents


What Is This

JARVIS is a personal AI assistant built to run on your own machine, sound like the one from the films, and genuinely remember who you are. It is not a chatbot wrapper. It is a complete runtime: a multi-agent orchestration system sitting on top of an async event bus, with four tiers of persistent memory, runtime self-synthesis for missing capabilities, neural TTS, biometric authentication, gesture control, and a React UI served through pywebview.

When you say something, the Orchestrator decomposes your intent into a structured task plan and dispatches work to six specialist agents β€” concurrently, with dependency resolution. Complex requests are pre-processed through a reasoning layer. A background proactive agent monitors your calendar and open knowledge threads, surfacing things without being asked. And a four-tier memory system means JARVIS knows your preferences, recalls past conversations semantically, and tracks the people and projects you mention across every session.

you:    "open spotify, set an alarm for 7am, and what's the weather in London"

jarvis: [Orchestrator plans 3 parallel tasks]
        [MediaAgent β†’ spotify_play]
        [PersonalAgent β†’ set_alarm β†’ persisted to SQLite]
        [InfoAgent β†’ check_weather β†’ OpenWeather API]

        "Spotify is open. Alarm set for 7 AM β€” I'll make sure you're conscious.
         London: overcast, 11Β°C. You'll want a jacket.
         Though I suspect you'll ignore that."

Table of Contents


Architecture

Every component communicates through a single central async event bus. Nothing calls anything directly. The UI bridge, agents, skills, services, and memory system are fully decoupled β€” they publish and subscribe to named events.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      React Frontend                           β”‚
β”‚             Dashboard Mode  β—†  Pill Mode (overlay)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚  PyWebView JS Bridge
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Async Event Bus                             β”‚
β”‚               (central nervous system)                        β”‚
β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚          β”‚                  β”‚
   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
   β”‚   β”‚  Orchestrator   β”‚       β”‚
   β”‚   β”‚  ─────────────  β”‚       β”‚
   β”‚   β”‚  LLM planning   β”‚       β”‚
   β”‚   β”‚  Task graph     β”‚       β”‚
   β”‚   β”‚  Synthesis      β”‚       β”‚
   β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
   β”‚          β”‚  TaskQueue       β”‚
   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚   β”‚           Specialist Agents                      β”‚
   β”‚   β”‚  InfoAgent Β· SystemAgent Β· MediaAgent           β”‚
   β”‚   β”‚  CommsAgent Β· BrowserAgent Β· PersonalAgent      β”‚
   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚                       β”‚  bus.emit(skill events)
   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚   β”‚                Skill Layer                         β”‚
   β”‚   β”‚  Weather Β· News Β· Email Β· WhatsApp Β· Calendar     β”‚
   β”‚   β”‚  Spotify Β· Browser Β· System Β· Volume Β· Alarm...   β”‚
   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚
β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Services                                β”‚
β”‚  TTS (Kokoro) Β· STT Β· Wake Word Β· Biometrics Β· Gesture       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Memory System                              β”‚
β”‚  Working Β· Episodic (ChromaDB) Β· Semantic Β· Entity Graph     β”‚
β”‚  + ProactiveAgent (background, 5-min cycle)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Layer

Phase 4 introduced a complete multi-agent system that replaces the flat process_user_input β†’ LLMSkill path for all non-trivial requests.

Orchestrator (agents/orchestrator.py)

The entry point for all user input. Uses the LLM to produce a structured JSON task plan β€” a dependency graph of work units, each assigned to a specialist agent. Independent tasks run in parallel via the TaskQueue. When multiple results need combining, a synthesis pass merges them into one natural response enriched with memory context.

Specialist Agents

Agent Handles
InfoAgent Weather, news headlines, time, general knowledge
SystemAgent App launch/close, volume, brightness, screenshots, file search, jokes
MediaAgent Spotify play/pause/skip/search, YouTube, video downloads
CommsAgent Email send/read, WhatsApp, Google Calendar, Google Meet
BrowserAgent URL navigation, web search, tab management
PersonalAgent Alarms, reminders, biometric login/register, gesture control

Each agent extends BaseAgent (interfaces/agent.py), has its own system prompt, optional LLM access, and returns a typed {status, result, speech} dict. They emit events onto the bus β€” never call skills directly.

ReasoningLayer (core/reasoning.py)

For complex inputs β€” multi-step requests, ambiguous queries, anything over ~12 words or containing compound signals like "and then", "schedule", "help me" β€” a pre-response reasoning pass runs first. Produces an internal scratchpad that informs the final response without ever being spoken. Improves quality precisely where it matters most.

TaskQueue (core/task_queue.py)

Executes task plans with dependency resolution and parallelism. Independent tasks run concurrently via asyncio.gather. Tasks that depend on others wait. Failed dependencies propagate cleanly without deadlock.

ProactiveAgent (core/proactive_agent.py)

A background coroutine on a 5-minute timer. Checks for upcoming calendar events and surfaces open entity threads β€” "You mentioned Alex was fixing the gesture bug β€” any update?" β€” without being asked.


Self-Synthesis

Phase 4.5 adds a dedicated SynthesisAgent that can research, generate, validate, and hot-load a new skill when JARVIS identifies a genuine capability gap. All generated code is constrained to jarvis-generated-code/, validated by an AST scanner plus an isolated subprocess sandbox, and only activated after an explicit user confirmation step.

The generated-skill runtime is supported by:

  • core/code_sandbox.py for static and runtime safety validation
  • core/skill_loader.py for the hardcoded generated directory, hot-loading, and SQLite registry persistence
  • agents/synthesis_agent.py for the research β†’ feasibility β†’ codegen β†’ validation β†’ confirmation pipeline

Memory System

Four tiers. All persistent. All integrated into every LLM call.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  WORKING MEMORY  (current session)                        β”‚
β”‚  Up to 40 annotated exchanges. Tracks intent, emotional   β”‚
β”‚  tone, active entities, and current task/goal state.      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  EPISODIC MEMORY  (past sessions β€” ChromaDB)              β”‚
β”‚  Semantic vector embeddings of session summaries via      β”‚
β”‚  sentence-transformers. Retrieved by relevance to the     β”‚
β”‚  current query, not recency. Knows what you talked        β”‚
β”‚  about last week.                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  SEMANTIC MEMORY  (user profile β€” SQLite)                 β”‚
β”‚  Persistent facts: preferences, habits, work context,     β”‚
β”‚  personal details. "prefers brief answers", "works in     β”‚
β”‚  AI", "based in Kuwait", "hates mornings".                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  ENTITY GRAPH  (knowledge graph β€” SQLite)                 β”‚
β”‚  Named entities tracked across all conversations:         β”‚
β”‚  people, projects, places, organisations, concepts.       β”‚
β”‚  Stores facts, relationships, aliases, open questions.    β”‚
β”‚  "Alex works on Jarvis project. Fixed gesture bug Tues."  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Before every response: user profile + semantically relevant past episodes + entity context + session state are injected into the system prompt. After every exchange: entity extraction runs in the background. On session close: the full conversation is summarised by the LLM and saved to ChromaDB for future retrieval.


Features

Intelligence

Capability Detail
Multi-agent orchestration LLM-planned task graphs, parallel execution, dependency resolution
Multi-provider LLM Groq (fast, cloud) β†’ local LLaMA 3.2 3B Q4_K_M (offline fallback)
Reasoning layer Pre-response scratchpad for complex/multi-step inputs
Compound commands Three simultaneous intents execute in parallel
Self-synthesis agent Researches, validates, and hot-loads new BaseSkill modules into jarvis-generated-code/
Capability manifest LLM knows exactly what it can and cannot do; offers alternatives
Context summarisation Auto-summarises history when approaching context window limit

Voice & Input

Capability Detail
Kokoro ONNX TTS bm_george β€” British male, fully offline, ~300ms CPU latency
pyttsx3 fallback Auto-activates if Kokoro models are absent
Wake word Always-on openWakeWord ("hey jarvis") + speech_recognition fallback
STT Google Speech Recognition, ambient noise adaptation, 1s pause threshold
Voice switching Change TTS voice at runtime by voice command

Vision & Biometrics

Capability Detail
Face auth face_recognition (dlib) for identity verification
LBPH fallback OpenCV LBPH when face_recognition is unavailable
Gesture control MediaPipe hand pose β€” pinch, fist, V-gesture map to system actions
Registration One-time python scripts/register_face.py β€” 50 samples, auto-trains

Automation

Capability Detail
Browser Full Selenium: open URLs, search, new/close/next/prev tab
Spotify Real Spotify Web API β€” play, pause, skip, search, volume
YouTube Search and auto-play first result via Selenium
Video download yt-dlp with ~/Downloads target
Email Send via Gmail SMTP, read unread via IMAP
WhatsApp pywhatkit + contacts.csv fuzzy name lookup
Google Calendar Read events, create events, schedule Google Meet
System Launch/close apps, volume (pycaw), brightness (WMI), screenshot, file search
Alarms SQLite-persisted, survive restart, fire via TTS
Proactive Calendar reminders, open entity threads surfaced automatically

UI

Capability Detail
Dashboard Mode Full interface: chat, avatar panel, live system stats
Pill Mode 280Γ—60px overlay, always on top, mode-switchable
Live system stats Real CPU / RAM / network via psutil, pushed every 3 seconds
ResonanceCore Animated visual driven by AI state: idle / thinking / speaking
Framer Motion Animated transitions throughout

Installation

Prerequisites

  • Python 3.11+
  • Node.js 18+ (only if rebuilding the frontend β€” pre-built dist/ is included)
  • Chrome (for browser control, YouTube, WhatsApp features)
  • Microphone (for voice input)

1. Clone and install

git clone https://github.com/your-username/jarvis-main.git
cd jarvis-main
pip install -r requirements.txt

2. Install the spaCy language model

Required for entity memory NER pre-pass:

python -m spacy download en_core_web_sm

3. Download AI models

Local LLM (~2GB, used as offline fallback):

python scripts/download_model.py

Kokoro TTS voice (~90MB):

python scripts/download_kokoro.py

4. Configure environment

cp config/.env.example config/.env
# Edit config/.env β€” add your API keys

5. Migrate an existing v2 database (existing installs only)

python scripts/migrate_v2_to_v3.py

6. Register your face (one-time)

python scripts/register_face.py

Look at the camera for ~30 seconds. Automatically selects face_recognition if installed, LBPH otherwise.

7. (Optional) Rebuild the frontend

cd frontend && npm install && npm run build && cd ..

Configuration

All secrets and feature flags in config/.env. Everything is optional β€” features degrade gracefully.

# ── LLM ──────────────────────────────────────────────────
LLM_MODE=groq                           # groq | local
LLM_API_KEY=gsk_your_groq_key           # console.groq.com
LLM_API_MODEL=llama-3.3-70b-versatile
LLM_MAX_TOKENS=300
LLM_TEMPERATURE=0.05

# Optional synthesis quality lane
ANTHROPIC_API_KEY=
ANTHROPIC_MODEL=claude-opus-4-6

# ── Weather ───────────────────────────────────────────────
OPENWEATHER_API_KEY=                    # openweathermap.org
DEFAULT_CITY=London

# ── Email ─────────────────────────────────────────────────
EMAIL_ADDRESS=your@gmail.com
EMAIL_PASSWORD=                         # Gmail App Password, not login password
IMAP_HOST=imap.gmail.com

# ── News ──────────────────────────────────────────────────
NEWS_API_KEY=                           # newsapi.org

# ── Spotify ───────────────────────────────────────────────
SPOTIFY_CLIENT_ID=
SPOTIFY_CLIENT_SECRET=
SPOTIFY_REDIRECT_URI=http://localhost:8888/callback

Gmail App Password: Google Account β†’ Security β†’ 2-Step Verification β†’ App Passwords β†’ generate one for "Mail". Use that value, not your actual password.

Spotify: Create an app at developer.spotify.com/dashboard. Set redirect URI to http://localhost:8888/callback. First launch opens a browser for OAuth β€” token cached in .cache afterwards.


Running JARVIS

Full UI mode

python webview_main.py

CLI mode

python jarvis_cli.py              # full
python jarvis_cli.py --no-tts     # text only, no audio
python jarvis_cli.py --debug      # live event tracing
python jarvis_cli.py --no-memory  # disable memory this session

CLI Interface

The CLI reuses the entire backend β€” same agents, same skills, same memory, same event bus. Every bug found here is a real backend bug.

Command Description
/debug Toggle live event tracing β€” see every bus event as it fires
/memory Dump memory state: user profile, entities, recent exchanges
/skills All registered skills and their event subscriptions
/events Raw event log from this session
/clear Clear terminal
/exit Graceful shutdown β€” archives session to episodic memory first
/help Command reference

Project Structure

jarvis-main/
β”‚
β”œβ”€β”€ agents/                         # Multi-agent layer (Phase 4/4.5)
β”‚   β”œβ”€β”€ orchestrator.py             # Plans, dispatches, synthesises
β”‚   β”œβ”€β”€ info_agent.py               # Weather, news, time, knowledge
β”‚   β”œβ”€β”€ system_agent.py             # Apps, volume, brightness, screenshot
β”‚   β”œβ”€β”€ media_agent.py              # Spotify, YouTube, downloads
β”‚   β”œβ”€β”€ comms_agent.py              # Email, WhatsApp, Calendar
β”‚   β”œβ”€β”€ browser_agent.py            # URL, search, tab management
β”‚   β”œβ”€β”€ personal_agent.py           # Alarms, reminders, biometrics, gestures
β”‚   └── synthesis_agent.py          # Runtime research/codegen/validation pipeline
β”‚
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ capability_manifest.py      # JARVIS capability list β€” injected into prompts
β”‚   β”œβ”€β”€ database.py                 # SQLite: alarms, history, all memory tables
β”‚   β”œβ”€β”€ engine.py                   # Engine lifecycle
β”‚   β”œβ”€β”€ event_bus.py                # Central async event bus
β”‚   β”œβ”€β”€ llm_client.py               # Groq β†’ local LLaMA fallback, singleton
β”‚   β”œβ”€β”€ logger.py                   # Structured logging
β”‚   β”œβ”€β”€ proactive_agent.py          # Background: calendar checks, open threads
β”‚   β”œβ”€β”€ reasoning.py                # Pre-response reasoning scratchpad
β”‚   β”œβ”€β”€ code_sandbox.py             # Generated skill AST + subprocess validation
β”‚   β”œβ”€β”€ skill_loader.py             # Generated skill hot-loading and registry
β”‚   β”œβ”€β”€ task_queue.py               # Parallel task execution + dependency resolution
β”‚   └── memory/
β”‚       β”œβ”€β”€ manager.py              # MemoryManager singleton
β”‚       β”œβ”€β”€ working.py              # Session exchanges with annotations
β”‚       β”œβ”€β”€ episodic.py             # ChromaDB semantic vector store
β”‚       β”œβ”€β”€ semantic.py             # SQLite user fact store
β”‚       β”œβ”€β”€ procedural.py           # Named workflow store
β”‚       β”œβ”€β”€ entity_store.py         # Entity knowledge graph
β”‚       └── entity_extractor.py     # LLM entity extraction pipeline
β”‚
β”œβ”€β”€ services/                       # Hardware I/O only
β”‚   β”œβ”€β”€ tts.py                      # Kokoro ONNX + pyttsx3 fallback
β”‚   β”œβ”€β”€ stt.py                      # Speech recognition
β”‚   β”œβ”€β”€ biometrics.py               # face_recognition / LBPH
β”‚   β”œβ”€β”€ gesture.py                  # MediaPipe gesture β†’ events
β”‚   └── wake_word.py                # openWakeWord / SR fallback
β”‚
β”œβ”€β”€ skills/                         # Event-driven capability modules
β”‚   β”œβ”€β”€ llm_skill.py                # Fallback brain (AGENTS_ENABLED=False)
β”‚   β”œβ”€β”€ weather_skill.py
β”‚   β”œβ”€β”€ communication.py            # Email SMTP + IMAP
β”‚   β”œβ”€β”€ whatsapp_skill.py
β”‚   β”œβ”€β”€ news_skill.py
β”‚   β”œβ”€β”€ media_control.py            # YouTube via Selenium
β”‚   β”œβ”€β”€ media_downloader.py         # yt-dlp
β”‚   β”œβ”€β”€ browser_control.py          # Selenium tab control
β”‚   β”œβ”€β”€ system.py                   # subprocess app launch/close
β”‚   β”œβ”€β”€ system_control.py           # Volume, brightness, screenshot
β”‚   β”œβ”€β”€ quick_launch.py             # Known apps/URLs
β”‚   β”œβ”€β”€ productivity.py             # SQLite-persisted alarms
β”‚   β”œβ”€β”€ spotify_skill.py            # Spotify Web API
β”‚   β”œβ”€β”€ calendar_skill.py           # Google Calendar + Meet
β”‚   └── web_automation.py           # Google search
β”‚
β”œβ”€β”€ interfaces/
β”‚   β”œβ”€β”€ agent.py                    # BaseAgent
β”‚   β”œβ”€β”€ skill.py                    # BaseSkill
β”‚   └── adapter.py                  # BaseAdapter
β”‚
β”œβ”€β”€ frontend/src/
β”‚   β”œβ”€β”€ components/                 # AvatarPanel, ChatArea, EntityPanel, DashboardMode...
β”‚   β”œβ”€β”€ hooks/                      # useJarvisBridge, useJarvisState, useChatHistory
β”‚   └── styles/                     # tokens.css, animations.css, components.css
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ setup.py                    # Interactive first-run wizard
β”‚   β”œβ”€β”€ download_model.py
β”‚   β”œβ”€β”€ download_kokoro.py
β”‚   β”œβ”€β”€ migrate_v2_to_v3.py         # Legacy DB migration helper
β”‚   β”œβ”€β”€ register_face.py
β”‚   β”œβ”€β”€ test_all.py                 # Focused v3 validation suite
β”‚   └── test_entities.py
β”‚
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ Llama-3.2-3B-Instruct-Q4_K_M.gguf
β”‚   β”œβ”€β”€ kokoro/                     # kokoro-v0_19.onnx + voices.bin
β”‚   └── biometrics/                 # face_model.yml + face_encodings.npy
β”‚
β”œβ”€β”€ memory_store/chroma/            # ChromaDB episodic store (auto-created)
β”œβ”€β”€ jarvis-generated-code/          # Runtime-generated BaseSkill modules
β”œβ”€β”€ config/.env                     # Secrets β€” gitignored
β”œβ”€β”€ config/settings.yaml
β”œβ”€β”€ webview_main.py                 # Main entry point
β”œβ”€β”€ jarvis_cli.py                   # Terminal entry point
β”œβ”€β”€ app_config.py                   # Feature flags
└── jarvis.db                       # SQLite (auto-created, gitignored)

Extending JARVIS

Add a Skill (new capability)

# skills/my_skill.py
from interfaces.skill import BaseSkill
from core.event_bus import Event

class MySkill(BaseSkill):
    def register(self):
        self.bus.subscribe("my_trigger", self.handle)

    async def handle(self, event: Event):
        await self.bus.emit("tts_speak", "Done.")

Register in webview_main.py and jarvis_cli.py.

Add an Agent Action

Add the action string to the Orchestrator's ORCHESTRATOR_PROMPT, handle it in the relevant agent's handle() method, and emit the appropriate skill event.

Add a New Agent

# agents/my_agent.py
from interfaces.agent import BaseAgent

class MyAgent(BaseAgent):
    NAME = "MyAgent"

    async def handle(self, task: dict) -> dict:
        action = task.get("action", "")
        if action == "my_action":
            await self.bus.emit("my_event", task.get("params", {}))
            return self._ok(speech="Done.")
        return self._err(f"Unknown: {action}")

Register in agents/__init__.py, add to _agent_registry in webview_main.py, add to Orchestrator's available agents list.


Voice & Personality

TTS

Kokoro ONNX with bm_george β€” a real British male voice that runs entirely offline at ~300ms CPU latency. Automatically falls back to pyttsx3 if models are absent. Upgrade path to ElevenLabs "Brian" (nPczCjzI2devNBz1zQrb) for production-quality output.

Personality

British butler. Impeccably polite on the surface, quietly judging everything underneath. Dry wit, zero sycophancy, genuine care. Never uses "Certainly!", "Absolutely!", "Great question!". British spellings throughout. See SYSTEM_PROMPT in skills/llm_skill.py for the full definition.

The CAPABILITY_MANIFEST in core/capability_manifest.py ensures the LLM always knows exactly what it can and cannot do, and offers sensible alternatives rather than failing silently or hallucinating capabilities.


Biometric Setup

face_recognition β€” recommended, more accurate

pip install face_recognition    # requires cmake + dlib (~10 min build)
python scripts/register_face.py
# β†’ saves models/biometrics/face_encodings.npy

OpenCV LBPH β€” fallback, zero extra dependencies

python scripts/register_face.py
# Auto-detects absence of face_recognition β†’ LBPH path
# β†’ saves models/biometrics/face_model.yml

Trigger: say "login", type login in the CLI, or press the login button in the UI.


Testing

# Focused v3 validation suite
python scripts/test_all.py

# Entity memory targeted tests
python scripts/test_entities.py

# Quick Groq connectivity check
python scripts/test_groq_conn.py

scripts/test_all.py exercises schema bootstrap, entity contradiction handling, reasoning heuristics, task-queue parallelism, synthesis safety, migration idempotency, and the Phase 5 frontend wiring. Non-zero exit code on failure.


License

Distributed under the POV Personal Use License v1. See LICENSE.

Free for personal use, study, and non-commercial projects with attribution. Commercial use requires written permission from the author.


"Sometimes you've gotta run before you can walk."


Built with Python Β· React Β· asyncio Β· Groq Β· Kokoro Β· ChromaDB Β· and entirely too much ambition.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors