Skip to content

Latest commit

 

History

History
357 lines (279 loc) · 7.43 KB

File metadata and controls

357 lines (279 loc) · 7.43 KB

🚀 VortexAI - OS Extender

A Fully Offline, Local-AI Powered OS Extension & Personal Assistant

Jarvis-Style. Multimodal. Multi-Language. Low-Resource. Zero APIs.

🧠 Overview

VortexAI is a local AI layer that sits on top of your operating system and acts as a:

  • Personal assistant
  • Automation engine
  • Voice-controlled AI agent
  • File/indexing system
  • Photo organizer
  • Document processor
  • Email/message manager
  • Generative tools provider

It runs 100% offline, ships with all AI models inside the .exe, and supports English, Hindi, and French voice interaction, including mixed-language input (Hinglish, Frenglish, Code-Switching).

You own your data. No cloud. No APIs. No external installs.

🌐 Core Capabilities

AstraOS comes with:

  • ✔ Local LLM (Llama 3 / Mistral / Phi / Gemma – GGUF)
  • ✔ Local Vision (LLaVA / SigLIP)
  • ✔ Local Embeddings (BGE / LaBSE / CLIP)
  • ✔ Local Image Generation (Stable Diffusion Turbo)
  • ✔ Local Speech Recognition (Whisper)
  • ✔ Local Speech Synthesis (VITS / Piper)
  • ✔ Local Vector Search Database (FAISS)
  • ✔ Automation Engine (OS-level control)
  • ✔ Web Scraper (Rust-based, safe-mode)
  • ✔ Fully Configurable Settings UI (Tauri)

🗂 Tech Stack

Layer Technology
Core Runtime Rust
UI Tauri + React/Svelte
Local LLM Engine llama.cpp (statically linked)
STT Whisper.cpp
TTS Piper / VITS Local
Image Generation diffusion.cpp
Vision / OCR LLaVA.cpp / Tesseract
Vector Database FAISS (local)
Metadata DB SQLite
Filesystem Indexer Rust async walkers
Task Automation Windows APIs via winapi or Linux syscalls

🏗 System Architecture

AstraOS/
│
├── Core Runtime (Rust)
│   ├── Event Loop
│   ├── Intent Parser
│   ├── Skill Engine
│   ├── Memory Engine
│   └── Scheduler
│
├── AI Layer
│   ├── LLM (llama.cpp)
│   ├── Vision (llava.cpp)
│   ├── Diffusion (sd.cpp)
│   ├── Embeddings (bge / clip / labse)
│   ├── STT (whisper.cpp)
│   └── TTS (piper)
│
├── Storage Layer
│   ├── SQLite (metadata)
│   ├── FAISS (vector index)
│   ├── Cache (json)
│   └── File Registry
│
├── Modules
│   ├── Photo Organizer
│   ├── File Search
│   ├── Email Manager
│   ├── Docs Parser
│   ├── Automation Tools
│   ├── Browser Agent
│   └── Settings & Profiles
│
└── UI Layer (Tauri)

🔍 Embedding Architecture (Text + Image + Audio)

Text Embeddings

Used for:

  • Intent recognition
  • Semantic search
  • Memory lookups
  • File search

Model: bge-small-en-v1.5.gguf (60–120MB)

Image Embeddings

Used for:

  • Photo clustering
  • Similar photo search
  • OCR + relevance ranking
  • Deduplication

Model: clip-ViT-B-32.gguf

Audio Embeddings

Used for:

  • Speaker identity
  • Voice command segmentation
  • Voice memory

Model: Whisper encoder embeddings

🗃 Database Architecture

1. SQLite (Metadata Layer)

Tables included:

/db/app.db
├── user_settings
├── voice_profiles
├── automation_rules
├── task_history
├── scrape_cache
├── email_index
├── file_registry
└── photo_metadata

Each with normalized schemas.

2. FAISS Vector Index (Semantic Layer)

Folder: /vector/

Contains:

Index Purpose Embedding Type
memory.index Long-term AI memory text
files.index Document search text
photos.index Image similarity image
speech.index Speaker embeddings audio
skills.index Intent → skill mapping text

All indexes load at boot in streaming mode.

3. Cache Layer

Folder: /cache/

  • preprocessed OCR
  • STT partial segments
  • temp embeddings
  • web-scraped DOM snapshots
  • active conversation state

🗣 Voice Engine

✔ Multilingual Mixed Input

English + Hindi + French

Simultaneously. No switching.

STT Pipeline

  1. VAD (Voice Activity Detection)
  2. Whisper.cpp (medium or small)
  3. Language auto-detect
  4. Code-switch detection
  5. Sentence reconstruction
  6. Punctuation
  7. Intent classification

TTS Pipeline

Model: Piper FastVITS Multilingual

Voices included:

  • English (US/Neutral)
  • Hindi (Delhi/Neutral)
  • French (Paris/Neutral)

Speed: real-time or faster than real-time

🖼 Photo Organizer Module

Features:

  • auto-scan entire system
  • EXIF extraction
  • people clustering
  • location-based grouping
  • duplicate removal
  • object tags via vision model
  • timeline view
  • semantic search: "Show photos where I'm wearing a red hoodie with friends at night"

Uses:

  • CLIP embeddings
  • FAISS photos.index
  • SQLite photo metadata

📂 File & Document Manager

Supports:

  • PDF
  • DOCX
  • TXT
  • PPTX
  • Markdown
  • Images
  • Audio transcription
  • Code files

Extracts:

  • text content
  • embeddings
  • key metadata
  • summaries
  • timeline clusters

🤖 Automation Engine

You can say things like:

  • "Turn off my PC at 11."
  • "When a new email arrives from professor, notify me."
  • "Download all PDFs from this site."
  • "Sort all of my desktop files."

Backend uses:

  • OS APIs
  • Node bindings inside Tauri
  • Rust automation drivers
  • A plugin-based skill system

🎨 Generative Tools

1. Local Image Generation

Model:

  • sd-turbo.gguf (fast)
  • sdxl-lightning.gguf (optional)

2. Local PPT/Document Generation

Templates stored in /templates/.

3. Local Code Templates

For React, Python, JS, etc.

⚙️ Settings & Profiles

Exposed Options:

  • choose LLM model
  • choose voice model
  • GPU/CPU toggle
  • resource/priority mode
  • background permissions
  • task scheduling
  • privacy controls
  • memory wipe
  • vector reindex

📦 Packaging Into One EXE

Bundler: Tauri → NSIS → final .exe

Included in build:

  • Rust runtime
  • Tauri frontend
  • AI engines (llama.cpp, whisper.cpp, sd.cpp)
  • All GGUF models
  • SQLite DB
  • FAISS indexes
  • Voice models
  • Resource folder

Single exe output size: 1.8GB – 3.5GB depending on model choices.

📁 Final Folder Structure

AstraOS/
│
├── app.exe
├── README.md
├── models/
│   ├── llm/
│   │   └── llama-3-8b.gguf
│   ├── vision/
│   │   └── llava-1.6.gguf
│   ├── stt/
│   │   └── whisper-medium.gguf
│   ├── tts/
│   │   └── piper-multilingual.onnx
│   ├── embeddings/
│   │   ├── bge-small.gguf
│   │   └── clip-ViT-B-32.gguf
│   └── sd/
│       └── sd-turbo.gguf
│
├── db/
│   └── app.db
├── vector/
│   ├── memory.index
│   ├── files.index
│   ├── photos.index
│   └── speech.index
│
├── cache/
├── logs/
├── plugins/
└── templates/

🔐 Privacy & Security

  • No internet calls (unless user enables web scraping)
  • All data stored locally
  • User-controlled memory wipe
  • Password-protected profile
  • Hardware-bound encryption option

🏎 Performance Optimizations

  • lazy model loading
  • tensor caching
  • quantized GGUF
  • streaming inference
  • async Rust runtime
  • CPU/GPU configurable load
  • auto-sleep mode

📜 Roadmap

  • mobile companion app
  • face recognition & tagging
  • full browser automation
  • plugin marketplace
  • smart scheduler
  • multimodal memory graphs
  • multi-agent architecture