🚀 VortexAI - OS Extender

A Fully Offline, Local-AI Powered OS Extension & Personal Assistant

Jarvis-Style. Multimodal. Multi-Language. Low-Resource. Zero APIs.

🧠 Overview

VortexAI is a local AI layer that sits on top of your operating system and acts as a:

Personal assistant
Automation engine
Voice-controlled AI agent
File/indexing system
Photo organizer
Document processor
Email/message manager
Generative tools provider

It runs 100% offline, ships with all AI models inside the .exe, and supports English, Hindi, and French voice interaction, including mixed-language input (Hinglish, Frenglish, Code-Switching).

You own your data. No cloud. No APIs. No external installs.

🌐 Core Capabilities

AstraOS comes with:

✔ Local LLM (Llama 3 / Mistral / Phi / Gemma – GGUF)
✔ Local Vision (LLaVA / SigLIP)
✔ Local Embeddings (BGE / LaBSE / CLIP)
✔ Local Image Generation (Stable Diffusion Turbo)
✔ Local Speech Recognition (Whisper)
✔ Local Speech Synthesis (VITS / Piper)
✔ Local Vector Search Database (FAISS)
✔ Automation Engine (OS-level control)
✔ Web Scraper (Rust-based, safe-mode)
✔ Fully Configurable Settings UI (Tauri)

🗂 Tech Stack

Layer	Technology
Core Runtime	Rust
UI	Tauri + React/Svelte
Local LLM Engine	llama.cpp (statically linked)
STT	Whisper.cpp
TTS	Piper / VITS Local
Image Generation	diffusion.cpp
Vision / OCR	LLaVA.cpp / Tesseract
Vector Database	FAISS (local)
Metadata DB	SQLite
Filesystem Indexer	Rust async walkers
Task Automation	Windows APIs via winapi or Linux syscalls

🏗 System Architecture

AstraOS/
│
├── Core Runtime (Rust)
│   ├── Event Loop
│   ├── Intent Parser
│   ├── Skill Engine
│   ├── Memory Engine
│   └── Scheduler
│
├── AI Layer
│   ├── LLM (llama.cpp)
│   ├── Vision (llava.cpp)
│   ├── Diffusion (sd.cpp)
│   ├── Embeddings (bge / clip / labse)
│   ├── STT (whisper.cpp)
│   └── TTS (piper)
│
├── Storage Layer
│   ├── SQLite (metadata)
│   ├── FAISS (vector index)
│   ├── Cache (json)
│   └── File Registry
│
├── Modules
│   ├── Photo Organizer
│   ├── File Search
│   ├── Email Manager
│   ├── Docs Parser
│   ├── Automation Tools
│   ├── Browser Agent
│   └── Settings & Profiles
│
└── UI Layer (Tauri)

🔍 Embedding Architecture (Text + Image + Audio)

Text Embeddings

Used for:

Intent recognition
Semantic search
Memory lookups
File search

Model: bge-small-en-v1.5.gguf (60–120MB)

Image Embeddings

Used for:

Photo clustering
Similar photo search
OCR + relevance ranking
Deduplication

Model: clip-ViT-B-32.gguf

Audio Embeddings

Used for:

Speaker identity
Voice command segmentation
Voice memory

Model: Whisper encoder embeddings

🗃 Database Architecture

1. SQLite (Metadata Layer)

Tables included:

/db/app.db
├── user_settings
├── voice_profiles
├── automation_rules
├── task_history
├── scrape_cache
├── email_index
├── file_registry
└── photo_metadata

Each with normalized schemas.

2. FAISS Vector Index (Semantic Layer)

Folder: /vector/

Contains:

Index	Purpose	Embedding Type
memory.index	Long-term AI memory	text
files.index	Document search	text
photos.index	Image similarity	image
speech.index	Speaker embeddings	audio
skills.index	Intent → skill mapping	text

All indexes load at boot in streaming mode.

3. Cache Layer

Folder: /cache/

preprocessed OCR
STT partial segments
temp embeddings
web-scraped DOM snapshots
active conversation state

🗣 Voice Engine

✔ Multilingual Mixed Input

English + Hindi + French

Simultaneously. No switching.

STT Pipeline

VAD (Voice Activity Detection)
Whisper.cpp (medium or small)
Language auto-detect
Code-switch detection
Sentence reconstruction
Punctuation
Intent classification

TTS Pipeline

Model: Piper FastVITS Multilingual

Voices included:

English (US/Neutral)
Hindi (Delhi/Neutral)
French (Paris/Neutral)

Speed: real-time or faster than real-time

🖼 Photo Organizer Module

Features:

auto-scan entire system
EXIF extraction
people clustering
location-based grouping
duplicate removal
object tags via vision model
timeline view
semantic search: "Show photos where I'm wearing a red hoodie with friends at night"

Uses:

CLIP embeddings
FAISS photos.index
SQLite photo metadata

📂 File & Document Manager

Supports:

PDF
DOCX
TXT
PPTX
Markdown
Images
Audio transcription
Code files

Extracts:

text content
embeddings
key metadata
summaries
timeline clusters

🤖 Automation Engine

You can say things like:

"Turn off my PC at 11."
"When a new email arrives from professor, notify me."
"Download all PDFs from this site."
"Sort all of my desktop files."

Backend uses:

OS APIs
Node bindings inside Tauri
Rust automation drivers
A plugin-based skill system

🎨 Generative Tools

1. Local Image Generation

Model:

sd-turbo.gguf (fast)
sdxl-lightning.gguf (optional)

2. Local PPT/Document Generation

Templates stored in /templates/.

3. Local Code Templates

For React, Python, JS, etc.

⚙️ Settings & Profiles

Exposed Options:

choose LLM model
choose voice model
GPU/CPU toggle
resource/priority mode
background permissions
task scheduling
privacy controls
memory wipe
vector reindex

📦 Packaging Into One EXE

Bundler: Tauri → NSIS → final .exe

Included in build:

Rust runtime
Tauri frontend
AI engines (llama.cpp, whisper.cpp, sd.cpp)
All GGUF models
SQLite DB
FAISS indexes
Voice models
Resource folder

Single exe output size: 1.8GB – 3.5GB depending on model choices.

📁 Final Folder Structure

AstraOS/
│
├── app.exe
├── README.md
├── models/
│   ├── llm/
│   │   └── llama-3-8b.gguf
│   ├── vision/
│   │   └── llava-1.6.gguf
│   ├── stt/
│   │   └── whisper-medium.gguf
│   ├── tts/
│   │   └── piper-multilingual.onnx
│   ├── embeddings/
│   │   ├── bge-small.gguf
│   │   └── clip-ViT-B-32.gguf
│   └── sd/
│       └── sd-turbo.gguf
│
├── db/
│   └── app.db
├── vector/
│   ├── memory.index
│   ├── files.index
│   ├── photos.index
│   └── speech.index
│
├── cache/
├── logs/
├── plugins/
└── templates/

🔐 Privacy & Security

No internet calls (unless user enables web scraping)
All data stored locally
User-controlled memory wipe
Password-protected profile
Hardware-bound encryption option

🏎 Performance Optimizations

lazy model loading
tensor caching
quantized GGUF
streaming inference
async Rust runtime
CPU/GPU configurable load
auto-sleep mode

📜 Roadmap

mobile companion app
face recognition & tagging
full browser automation
plugin marketplace
smart scheduler
multimodal memory graphs
multi-agent architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 VortexAI - OS Extender

🧠 Overview

🌐 Core Capabilities

🗂 Tech Stack

🏗 System Architecture

🔍 Embedding Architecture (Text + Image + Audio)

Text Embeddings

Image Embeddings

Audio Embeddings

🗃 Database Architecture

1. SQLite (Metadata Layer)

2. FAISS Vector Index (Semantic Layer)

3. Cache Layer

🗣 Voice Engine

✔ Multilingual Mixed Input

STT Pipeline

TTS Pipeline

🖼 Photo Organizer Module

📂 File & Document Manager

🤖 Automation Engine

🎨 Generative Tools

1. Local Image Generation

2. Local PPT/Document Generation

3. Local Code Templates

⚙️ Settings & Profiles

📦 Packaging Into One EXE

📁 Final Folder Structure

🔐 Privacy & Security

🏎 Performance Optimizations

📜 Roadmap

FilesExpand file tree

Readme.md

Latest commit

History

Readme.md

File metadata and controls

🚀 VortexAI - OS Extender

🧠 Overview

🌐 Core Capabilities

🗂 Tech Stack

🏗 System Architecture

🔍 Embedding Architecture (Text + Image + Audio)

Text Embeddings

Image Embeddings

Audio Embeddings

🗃 Database Architecture

1. SQLite (Metadata Layer)

2. FAISS Vector Index (Semantic Layer)

3. Cache Layer

🗣 Voice Engine

✔ Multilingual Mixed Input

STT Pipeline

TTS Pipeline

🖼 Photo Organizer Module

📂 File & Document Manager

🤖 Automation Engine

🎨 Generative Tools

1. Local Image Generation

2. Local PPT/Document Generation

3. Local Code Templates

⚙️ Settings & Profiles

📦 Packaging Into One EXE

📁 Final Folder Structure

🔐 Privacy & Security

🏎 Performance Optimizations

📜 Roadmap