GitHub - RaghavSethi006/JARVIS-V3

     ██╗ █████╗ ██████╗ ██╗   ██╗██╗███████╗
     ██║██╔══██╗██╔══██╗██║   ██║██║██╔════╝
     ██║███████║██████╔╝██║   ██║██║███████╗
██   ██║██╔══██║██╔══██╗╚██╗ ██╔╝██║╚════██║
╚█████╔╝██║  ██║██║  ██║ ╚████╔╝ ██║███████║
 ╚════╝ ╚═╝  ╚═╝╚═╝  ╚═╝  ╚═══╝  ╚═╝╚══════╝

Just A Rather Very Intelligent System

A local-first, multi-agent AI desktop assistant with persistent memory,
runtime self-synthesis, neural voice synthesis, biometric authentication, and a British butler attitude.

What Is This

JARVIS is a personal AI assistant built to run on your own machine, sound like the one from the films, and genuinely remember who you are. It is not a chatbot wrapper. It is a complete runtime: a multi-agent orchestration system sitting on top of an async event bus, with four tiers of persistent memory, runtime self-synthesis for missing capabilities, neural TTS, biometric authentication, gesture control, and a React UI served through pywebview.

When you say something, the Orchestrator decomposes your intent into a structured task plan and dispatches work to six specialist agents — concurrently, with dependency resolution. Complex requests are pre-processed through a reasoning layer. A background proactive agent monitors your calendar and open knowledge threads, surfacing things without being asked. And a four-tier memory system means JARVIS knows your preferences, recalls past conversations semantically, and tracks the people and projects you mention across every session.

you:    "open spotify, set an alarm for 7am, and what's the weather in London"

jarvis: [Orchestrator plans 3 parallel tasks]
        [MediaAgent → spotify_play]
        [PersonalAgent → set_alarm → persisted to SQLite]
        [InfoAgent → check_weather → OpenWeather API]

        "Spotify is open. Alarm set for 7 AM — I'll make sure you're conscious.
         London: overcast, 11°C. You'll want a jacket.
         Though I suspect you'll ignore that."

Architecture

Every component communicates through a single central async event bus. Nothing calls anything directly. The UI bridge, agents, skills, services, and memory system are fully decoupled — they publish and subscribe to named events.

┌──────────────────────────────────────────────────────────────┐
│                      React Frontend                           │
│             Dashboard Mode  ◆  Pill Mode (overlay)            │
└─────────────────────────┬────────────────────────────────────┘
                          │  PyWebView JS Bridge
┌─────────────────────────▼────────────────────────────────────┐
│                   Async Event Bus                             │
│               (central nervous system)                        │
└──┬──────────┬──────────────────┬────────────────────────────┘
   │          │                  │
   │   ┌──────▼──────────┐       │
   │   │  Orchestrator   │       │
   │   │  ─────────────  │       │
   │   │  LLM planning   │       │
   │   │  Task graph     │       │
   │   │  Synthesis      │       │
   │   └──────┬──────────┘       │
   │          │  TaskQueue       │
   │   ┌──────▼──────────────────▼──────────────────────┐
   │   │           Specialist Agents                      │
   │   │  InfoAgent · SystemAgent · MediaAgent           │
   │   │  CommsAgent · BrowserAgent · PersonalAgent      │
   │   └───────────────────┬──────────────────────────────┘
   │                       │  bus.emit(skill events)
   │   ┌───────────────────▼──────────────────────────────┐
   │   │                Skill Layer                         │
   │   │  Weather · News · Email · WhatsApp · Calendar     │
   │   │  Spotify · Browser · System · Volume · Alarm...   │
   │   └──────────────────────────────────────────────────┘
   │
┌──▼──────────────────────────────────────────────────────────┐
│                      Services                                │
│  TTS (Kokoro) · STT · Wake Word · Biometrics · Gesture       │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                   Memory System                              │
│  Working · Episodic (ChromaDB) · Semantic · Entity Graph     │
│  + ProactiveAgent (background, 5-min cycle)                  │
└─────────────────────────────────────────────────────────────┘

Agent Layer

Phase 4 introduced a complete multi-agent system that replaces the flat process_user_input → LLMSkill path for all non-trivial requests.

Orchestrator (`agents/orchestrator.py`)

The entry point for all user input. Uses the LLM to produce a structured JSON task plan — a dependency graph of work units, each assigned to a specialist agent. Independent tasks run in parallel via the TaskQueue. When multiple results need combining, a synthesis pass merges them into one natural response enriched with memory context.

Specialist Agents

Agent	Handles
InfoAgent	Weather, news headlines, time, general knowledge
SystemAgent	App launch/close, volume, brightness, screenshots, file search, jokes
MediaAgent	Spotify play/pause/skip/search, YouTube, video downloads
CommsAgent	Email send/read, WhatsApp, Google Calendar, Google Meet
BrowserAgent	URL navigation, web search, tab management
PersonalAgent	Alarms, reminders, biometric login/register, gesture control

Each agent extends BaseAgent (interfaces/agent.py), has its own system prompt, optional LLM access, and returns a typed {status, result, speech} dict. They emit events onto the bus — never call skills directly.

ReasoningLayer (`core/reasoning.py`)

For complex inputs — multi-step requests, ambiguous queries, anything over ~12 words or containing compound signals like "and then", "schedule", "help me" — a pre-response reasoning pass runs first. Produces an internal scratchpad that informs the final response without ever being spoken. Improves quality precisely where it matters most.

TaskQueue (`core/task_queue.py`)

Executes task plans with dependency resolution and parallelism. Independent tasks run concurrently via asyncio.gather. Tasks that depend on others wait. Failed dependencies propagate cleanly without deadlock.

ProactiveAgent (`core/proactive_agent.py`)

A background coroutine on a 5-minute timer. Checks for upcoming calendar events and surfaces open entity threads — "You mentioned Alex was fixing the gesture bug — any update?" — without being asked.

Self-Synthesis

Phase 4.5 adds a dedicated SynthesisAgent that can research, generate, validate, and hot-load a new skill when JARVIS identifies a genuine capability gap. All generated code is constrained to jarvis-generated-code/, validated by an AST scanner plus an isolated subprocess sandbox, and only activated after an explicit user confirmation step.

The generated-skill runtime is supported by:

core/code_sandbox.py for static and runtime safety validation
core/skill_loader.py for the hardcoded generated directory, hot-loading, and SQLite registry persistence
agents/synthesis_agent.py for the research → feasibility → codegen → validation → confirmation pipeline

Memory System

Four tiers. All persistent. All integrated into every LLM call.

┌──────────────────────────────────────────────────────────┐
│  WORKING MEMORY  (current session)                        │
│  Up to 40 annotated exchanges. Tracks intent, emotional   │
│  tone, active entities, and current task/goal state.      │
├──────────────────────────────────────────────────────────┤
│  EPISODIC MEMORY  (past sessions — ChromaDB)              │
│  Semantic vector embeddings of session summaries via      │
│  sentence-transformers. Retrieved by relevance to the     │
│  current query, not recency. Knows what you talked        │
│  about last week.                                         │
├──────────────────────────────────────────────────────────┤
│  SEMANTIC MEMORY  (user profile — SQLite)                 │
│  Persistent facts: preferences, habits, work context,     │
│  personal details. "prefers brief answers", "works in     │
│  AI", "based in Kuwait", "hates mornings".                │
├──────────────────────────────────────────────────────────┤
│  ENTITY GRAPH  (knowledge graph — SQLite)                 │
│  Named entities tracked across all conversations:         │
│  people, projects, places, organisations, concepts.       │
│  Stores facts, relationships, aliases, open questions.    │
│  "Alex works on Jarvis project. Fixed gesture bug Tues."  │
└──────────────────────────────────────────────────────────┘

Before every response: user profile + semantically relevant past episodes + entity context + session state are injected into the system prompt. After every exchange: entity extraction runs in the background. On session close: the full conversation is summarised by the LLM and saved to ChromaDB for future retrieval.

Features

Intelligence

Capability	Detail
Multi-agent orchestration	LLM-planned task graphs, parallel execution, dependency resolution
Multi-provider LLM	Groq (fast, cloud) → local LLaMA 3.2 3B Q4_K_M (offline fallback)
Reasoning layer	Pre-response scratchpad for complex/multi-step inputs
Compound commands	Three simultaneous intents execute in parallel
Self-synthesis agent	Researches, validates, and hot-loads new BaseSkill modules into `jarvis-generated-code/`
Capability manifest	LLM knows exactly what it can and cannot do; offers alternatives
Context summarisation	Auto-summarises history when approaching context window limit

Voice & Input

Capability	Detail
Kokoro ONNX TTS	`bm_george` — British male, fully offline, ~300ms CPU latency
pyttsx3 fallback	Auto-activates if Kokoro models are absent
Wake word	Always-on `openWakeWord` ("hey jarvis") + speech_recognition fallback
STT	Google Speech Recognition, ambient noise adaptation, 1s pause threshold
Voice switching	Change TTS voice at runtime by voice command

Vision & Biometrics

Capability	Detail
Face auth	`face_recognition` (dlib) for identity verification
LBPH fallback	OpenCV LBPH when `face_recognition` is unavailable
Gesture control	MediaPipe hand pose — pinch, fist, V-gesture map to system actions
Registration	One-time `python scripts/register_face.py` — 50 samples, auto-trains

Automation

Capability	Detail
Browser	Full Selenium: open URLs, search, new/close/next/prev tab
Spotify	Real Spotify Web API — play, pause, skip, search, volume
YouTube	Search and auto-play first result via Selenium
Video download	`yt-dlp` with `~/Downloads` target
Email	Send via Gmail SMTP, read unread via IMAP
WhatsApp	`pywhatkit` + `contacts.csv` fuzzy name lookup
Google Calendar	Read events, create events, schedule Google Meet
System	Launch/close apps, volume (pycaw), brightness (WMI), screenshot, file search
Alarms	SQLite-persisted, survive restart, fire via TTS
Proactive	Calendar reminders, open entity threads surfaced automatically

UI

Capability	Detail
Dashboard Mode	Full interface: chat, avatar panel, live system stats
Pill Mode	280×60px overlay, always on top, mode-switchable
Live system stats	Real CPU / RAM / network via `psutil`, pushed every 3 seconds
ResonanceCore	Animated visual driven by AI state: idle / thinking / speaking
Framer Motion	Animated transitions throughout

Installation

Prerequisites

Python 3.11+
Node.js 18+ (only if rebuilding the frontend — pre-built dist/ is included)
Chrome (for browser control, YouTube, WhatsApp features)
Microphone (for voice input)

1. Clone and install

git clone https://github.com/your-username/jarvis-main.git
cd jarvis-main
pip install -r requirements.txt

2. Install the spaCy language model

Required for entity memory NER pre-pass:

python -m spacy download en_core_web_sm

3. Download AI models

Local LLM (~2GB, used as offline fallback):

python scripts/download_model.py

Kokoro TTS voice (~90MB):

python scripts/download_kokoro.py

4. Configure environment

cp config/.env.example config/.env
# Edit config/.env — add your API keys

5. Migrate an existing v2 database (existing installs only)

python scripts/migrate_v2_to_v3.py

6. Register your face (one-time)

python scripts/register_face.py

Look at the camera for ~30 seconds. Automatically selects face_recognition if installed, LBPH otherwise.

7. (Optional) Rebuild the frontend

cd frontend && npm install && npm run build && cd ..

Configuration

All secrets and feature flags in config/.env. Everything is optional — features degrade gracefully.

# ── LLM ──────────────────────────────────────────────────
LLM_MODE=groq                           # groq | local
LLM_API_KEY=gsk_your_groq_key           # console.groq.com
LLM_API_MODEL=llama-3.3-70b-versatile
LLM_MAX_TOKENS=300
LLM_TEMPERATURE=0.05

# Optional synthesis quality lane
ANTHROPIC_API_KEY=
ANTHROPIC_MODEL=claude-opus-4-6

# ── Weather ───────────────────────────────────────────────
OPENWEATHER_API_KEY=                    # openweathermap.org
DEFAULT_CITY=London

# ── Email ─────────────────────────────────────────────────
EMAIL_ADDRESS=your@gmail.com
EMAIL_PASSWORD=                         # Gmail App Password, not login password
IMAP_HOST=imap.gmail.com

# ── News ──────────────────────────────────────────────────
NEWS_API_KEY=                           # newsapi.org

# ── Spotify ───────────────────────────────────────────────
SPOTIFY_CLIENT_ID=
SPOTIFY_CLIENT_SECRET=
SPOTIFY_REDIRECT_URI=http://localhost:8888/callback

Gmail App Password: Google Account → Security → 2-Step Verification → App Passwords → generate one for "Mail". Use that value, not your actual password.

Spotify: Create an app at developer.spotify.com/dashboard. Set redirect URI to http://localhost:8888/callback. First launch opens a browser for OAuth — token cached in .cache afterwards.

Running JARVIS

Full UI mode

python webview_main.py

CLI mode

python jarvis_cli.py              # full
python jarvis_cli.py --no-tts     # text only, no audio
python jarvis_cli.py --debug      # live event tracing
python jarvis_cli.py --no-memory  # disable memory this session

CLI Interface

The CLI reuses the entire backend — same agents, same skills, same memory, same event bus. Every bug found here is a real backend bug.

Command	Description
`/debug`	Toggle live event tracing — see every bus event as it fires
`/memory`	Dump memory state: user profile, entities, recent exchanges
`/skills`	All registered skills and their event subscriptions
`/events`	Raw event log from this session
`/clear`	Clear terminal
`/exit`	Graceful shutdown — archives session to episodic memory first
`/help`	Command reference

Project Structure

jarvis-main/
│
├── agents/                         # Multi-agent layer (Phase 4/4.5)
│   ├── orchestrator.py             # Plans, dispatches, synthesises
│   ├── info_agent.py               # Weather, news, time, knowledge
│   ├── system_agent.py             # Apps, volume, brightness, screenshot
│   ├── media_agent.py              # Spotify, YouTube, downloads
│   ├── comms_agent.py              # Email, WhatsApp, Calendar
│   ├── browser_agent.py            # URL, search, tab management
│   ├── personal_agent.py           # Alarms, reminders, biometrics, gestures
│   └── synthesis_agent.py          # Runtime research/codegen/validation pipeline
│
├── core/
│   ├── capability_manifest.py      # JARVIS capability list — injected into prompts
│   ├── database.py                 # SQLite: alarms, history, all memory tables
│   ├── engine.py                   # Engine lifecycle
│   ├── event_bus.py                # Central async event bus
│   ├── llm_client.py               # Groq → local LLaMA fallback, singleton
│   ├── logger.py                   # Structured logging
│   ├── proactive_agent.py          # Background: calendar checks, open threads
│   ├── reasoning.py                # Pre-response reasoning scratchpad
│   ├── code_sandbox.py             # Generated skill AST + subprocess validation
│   ├── skill_loader.py             # Generated skill hot-loading and registry
│   ├── task_queue.py               # Parallel task execution + dependency resolution
│   └── memory/
│       ├── manager.py              # MemoryManager singleton
│       ├── working.py              # Session exchanges with annotations
│       ├── episodic.py             # ChromaDB semantic vector store
│       ├── semantic.py             # SQLite user fact store
│       ├── procedural.py           # Named workflow store
│       ├── entity_store.py         # Entity knowledge graph
│       └── entity_extractor.py     # LLM entity extraction pipeline
│
├── services/                       # Hardware I/O only
│   ├── tts.py                      # Kokoro ONNX + pyttsx3 fallback
│   ├── stt.py                      # Speech recognition
│   ├── biometrics.py               # face_recognition / LBPH
│   ├── gesture.py                  # MediaPipe gesture → events
│   └── wake_word.py                # openWakeWord / SR fallback
│
├── skills/                         # Event-driven capability modules
│   ├── llm_skill.py                # Fallback brain (AGENTS_ENABLED=False)
│   ├── weather_skill.py
│   ├── communication.py            # Email SMTP + IMAP
│   ├── whatsapp_skill.py
│   ├── news_skill.py
│   ├── media_control.py            # YouTube via Selenium
│   ├── media_downloader.py         # yt-dlp
│   ├── browser_control.py          # Selenium tab control
│   ├── system.py                   # subprocess app launch/close
│   ├── system_control.py           # Volume, brightness, screenshot
│   ├── quick_launch.py             # Known apps/URLs
│   ├── productivity.py             # SQLite-persisted alarms
│   ├── spotify_skill.py            # Spotify Web API
│   ├── calendar_skill.py           # Google Calendar + Meet
│   └── web_automation.py           # Google search
│
├── interfaces/
│   ├── agent.py                    # BaseAgent
│   ├── skill.py                    # BaseSkill
│   └── adapter.py                  # BaseAdapter
│
├── frontend/src/
│   ├── components/                 # AvatarPanel, ChatArea, EntityPanel, DashboardMode...
│   ├── hooks/                      # useJarvisBridge, useJarvisState, useChatHistory
│   └── styles/                     # tokens.css, animations.css, components.css
│
├── scripts/
│   ├── setup.py                    # Interactive first-run wizard
│   ├── download_model.py
│   ├── download_kokoro.py
│   ├── migrate_v2_to_v3.py         # Legacy DB migration helper
│   ├── register_face.py
│   ├── test_all.py                 # Focused v3 validation suite
│   └── test_entities.py
│
├── models/
│   ├── Llama-3.2-3B-Instruct-Q4_K_M.gguf
│   ├── kokoro/                     # kokoro-v0_19.onnx + voices.bin
│   └── biometrics/                 # face_model.yml + face_encodings.npy
│
├── memory_store/chroma/            # ChromaDB episodic store (auto-created)
├── jarvis-generated-code/          # Runtime-generated BaseSkill modules
├── config/.env                     # Secrets — gitignored
├── config/settings.yaml
├── webview_main.py                 # Main entry point
├── jarvis_cli.py                   # Terminal entry point
├── app_config.py                   # Feature flags
└── jarvis.db                       # SQLite (auto-created, gitignored)

Extending JARVIS

Add a Skill (new capability)

# skills/my_skill.py
from interfaces.skill import BaseSkill
from core.event_bus import Event

class MySkill(BaseSkill):
    def register(self):
        self.bus.subscribe("my_trigger", self.handle)

    async def handle(self, event: Event):
        await self.bus.emit("tts_speak", "Done.")

Register in webview_main.py and jarvis_cli.py.

Add an Agent Action

Add the action string to the Orchestrator's ORCHESTRATOR_PROMPT, handle it in the relevant agent's handle() method, and emit the appropriate skill event.

Add a New Agent

# agents/my_agent.py
from interfaces.agent import BaseAgent

class MyAgent(BaseAgent):
    NAME = "MyAgent"

    async def handle(self, task: dict) -> dict:
        action = task.get("action", "")
        if action == "my_action":
            await self.bus.emit("my_event", task.get("params", {}))
            return self._ok(speech="Done.")
        return self._err(f"Unknown: {action}")

Register in agents/__init__.py, add to _agent_registry in webview_main.py, add to Orchestrator's available agents list.

Voice & Personality

TTS

Kokoro ONNX with bm_george — a real British male voice that runs entirely offline at ~300ms CPU latency. Automatically falls back to pyttsx3 if models are absent. Upgrade path to ElevenLabs "Brian" (nPczCjzI2devNBz1zQrb) for production-quality output.

Personality

British butler. Impeccably polite on the surface, quietly judging everything underneath. Dry wit, zero sycophancy, genuine care. Never uses "Certainly!", "Absolutely!", "Great question!". British spellings throughout. See SYSTEM_PROMPT in skills/llm_skill.py for the full definition.

The CAPABILITY_MANIFEST in core/capability_manifest.py ensures the LLM always knows exactly what it can and cannot do, and offers sensible alternatives rather than failing silently or hallucinating capabilities.

Biometric Setup

face_recognition — recommended, more accurate

pip install face_recognition    # requires cmake + dlib (~10 min build)
python scripts/register_face.py
# → saves models/biometrics/face_encodings.npy

OpenCV LBPH — fallback, zero extra dependencies

python scripts/register_face.py
# Auto-detects absence of face_recognition → LBPH path
# → saves models/biometrics/face_model.yml

Trigger: say "login", type login in the CLI, or press the login button in the UI.

Testing

# Focused v3 validation suite
python scripts/test_all.py

# Entity memory targeted tests
python scripts/test_entities.py

# Quick Groq connectivity check
python scripts/test_groq_conn.py

scripts/test_all.py exercises schema bootstrap, entity contradiction handling, reasoning heuristics, task-queue parallelism, synthesis safety, migration idempotency, and the Phase 5 frontend wiring. Non-zero exit code on failure.

License

Distributed under the POV Personal Use License v1. See LICENSE.

Free for personal use, study, and non-commercial projects with attribution. Commercial use requires written permission from the author.

"Sometimes you've gotta run before you can walk."

Built with Python · React · asyncio · Groq · Kokoro · ChromaDB · and entirely too much ambition.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
agents		agents
config		config
core		core
docs		docs
frontend		frontend
interfaces		interfaces
jarvis-generated-code		jarvis-generated-code
legacy_trash		legacy_trash
memory_store/chroma		memory_store/chroma
scripts		scripts
services		services
skills		skills
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app_config.py		app_config.py
jarvis.db		jarvis.db
jarvis_cli.py		jarvis_cli.py
repo_dump.md		repo_dump.md
requirements.txt		requirements.txt
webview_main.py		webview_main.py

Folders and files

Latest commit

History

Repository files navigation

What Is This

Table of Contents

Architecture

Agent Layer

Orchestrator (agents/orchestrator.py)

Specialist Agents

ReasoningLayer (core/reasoning.py)

TaskQueue (core/task_queue.py)

ProactiveAgent (core/proactive_agent.py)

Self-Synthesis

Memory System

Features

Intelligence

Voice & Input

Vision & Biometrics

Automation

UI

Installation

Prerequisites

1. Clone and install

2. Install the spaCy language model

3. Download AI models

4. Configure environment

5. Migrate an existing v2 database (existing installs only)

6. Register your face (one-time)

7. (Optional) Rebuild the frontend

Configuration

Running JARVIS

Full UI mode

CLI mode

CLI Interface

Project Structure

Extending JARVIS

Add a Skill (new capability)

Add an Agent Action

Add a New Agent

Voice & Personality

TTS

Personality

Biometric Setup

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Orchestrator (`agents/orchestrator.py`)

ReasoningLayer (`core/reasoning.py`)

TaskQueue (`core/task_queue.py`)

ProactiveAgent (`core/proactive_agent.py`)

Packages