A Fully Offline, Local-AI Powered OS Extension & Personal Assistant
Jarvis-Style. Multimodal. Multi-Language. Low-Resource. Zero APIs.
VortexAI is a local AI layer that sits on top of your operating system and acts as a:
- Personal assistant
- Automation engine
- Voice-controlled AI agent
- File/indexing system
- Photo organizer
- Document processor
- Email/message manager
- Generative tools provider
It runs 100% offline, ships with all AI models inside the .exe, and supports English, Hindi, and French voice interaction, including mixed-language input (Hinglish, Frenglish, Code-Switching).
You own your data. No cloud. No APIs. No external installs.
AstraOS comes with:
- ✔ Local LLM (Llama 3 / Mistral / Phi / Gemma – GGUF)
- ✔ Local Vision (LLaVA / SigLIP)
- ✔ Local Embeddings (BGE / LaBSE / CLIP)
- ✔ Local Image Generation (Stable Diffusion Turbo)
- ✔ Local Speech Recognition (Whisper)
- ✔ Local Speech Synthesis (VITS / Piper)
- ✔ Local Vector Search Database (FAISS)
- ✔ Automation Engine (OS-level control)
- ✔ Web Scraper (Rust-based, safe-mode)
- ✔ Fully Configurable Settings UI (Tauri)
| Layer | Technology |
|---|---|
| Core Runtime | Rust |
| UI | Tauri + React/Svelte |
| Local LLM Engine | llama.cpp (statically linked) |
| STT | Whisper.cpp |
| TTS | Piper / VITS Local |
| Image Generation | diffusion.cpp |
| Vision / OCR | LLaVA.cpp / Tesseract |
| Vector Database | FAISS (local) |
| Metadata DB | SQLite |
| Filesystem Indexer | Rust async walkers |
| Task Automation | Windows APIs via winapi or Linux syscalls |
AstraOS/
│
├── Core Runtime (Rust)
│ ├── Event Loop
│ ├── Intent Parser
│ ├── Skill Engine
│ ├── Memory Engine
│ └── Scheduler
│
├── AI Layer
│ ├── LLM (llama.cpp)
│ ├── Vision (llava.cpp)
│ ├── Diffusion (sd.cpp)
│ ├── Embeddings (bge / clip / labse)
│ ├── STT (whisper.cpp)
│ └── TTS (piper)
│
├── Storage Layer
│ ├── SQLite (metadata)
│ ├── FAISS (vector index)
│ ├── Cache (json)
│ └── File Registry
│
├── Modules
│ ├── Photo Organizer
│ ├── File Search
│ ├── Email Manager
│ ├── Docs Parser
│ ├── Automation Tools
│ ├── Browser Agent
│ └── Settings & Profiles
│
└── UI Layer (Tauri)
Used for:
- Intent recognition
- Semantic search
- Memory lookups
- File search
Model: bge-small-en-v1.5.gguf (60–120MB)
Used for:
- Photo clustering
- Similar photo search
- OCR + relevance ranking
- Deduplication
Model: clip-ViT-B-32.gguf
Used for:
- Speaker identity
- Voice command segmentation
- Voice memory
Model: Whisper encoder embeddings
Tables included:
/db/app.db
├── user_settings
├── voice_profiles
├── automation_rules
├── task_history
├── scrape_cache
├── email_index
├── file_registry
└── photo_metadata
Each with normalized schemas.
Folder: /vector/
Contains:
| Index | Purpose | Embedding Type |
|---|---|---|
| memory.index | Long-term AI memory | text |
| files.index | Document search | text |
| photos.index | Image similarity | image |
| speech.index | Speaker embeddings | audio |
| skills.index | Intent → skill mapping | text |
All indexes load at boot in streaming mode.
Folder: /cache/
- preprocessed OCR
- STT partial segments
- temp embeddings
- web-scraped DOM snapshots
- active conversation state
English + Hindi + French
Simultaneously. No switching.
- VAD (Voice Activity Detection)
- Whisper.cpp (medium or small)
- Language auto-detect
- Code-switch detection
- Sentence reconstruction
- Punctuation
- Intent classification
Model: Piper FastVITS Multilingual
Voices included:
- English (US/Neutral)
- Hindi (Delhi/Neutral)
- French (Paris/Neutral)
Speed: real-time or faster than real-time
Features:
- auto-scan entire system
- EXIF extraction
- people clustering
- location-based grouping
- duplicate removal
- object tags via vision model
- timeline view
- semantic search: "Show photos where I'm wearing a red hoodie with friends at night"
Uses:
- CLIP embeddings
- FAISS photos.index
- SQLite photo metadata
Supports:
- DOCX
- TXT
- PPTX
- Markdown
- Images
- Audio transcription
- Code files
Extracts:
- text content
- embeddings
- key metadata
- summaries
- timeline clusters
You can say things like:
- "Turn off my PC at 11."
- "When a new email arrives from professor, notify me."
- "Download all PDFs from this site."
- "Sort all of my desktop files."
Backend uses:
- OS APIs
- Node bindings inside Tauri
- Rust automation drivers
- A plugin-based skill system
Model:
sd-turbo.gguf(fast)sdxl-lightning.gguf(optional)
Templates stored in /templates/.
For React, Python, JS, etc.
Exposed Options:
- choose LLM model
- choose voice model
- GPU/CPU toggle
- resource/priority mode
- background permissions
- task scheduling
- privacy controls
- memory wipe
- vector reindex
Bundler: Tauri → NSIS → final .exe
Included in build:
- Rust runtime
- Tauri frontend
- AI engines (llama.cpp, whisper.cpp, sd.cpp)
- All GGUF models
- SQLite DB
- FAISS indexes
- Voice models
- Resource folder
Single exe output size: 1.8GB – 3.5GB depending on model choices.
AstraOS/
│
├── app.exe
├── README.md
├── models/
│ ├── llm/
│ │ └── llama-3-8b.gguf
│ ├── vision/
│ │ └── llava-1.6.gguf
│ ├── stt/
│ │ └── whisper-medium.gguf
│ ├── tts/
│ │ └── piper-multilingual.onnx
│ ├── embeddings/
│ │ ├── bge-small.gguf
│ │ └── clip-ViT-B-32.gguf
│ └── sd/
│ └── sd-turbo.gguf
│
├── db/
│ └── app.db
├── vector/
│ ├── memory.index
│ ├── files.index
│ ├── photos.index
│ └── speech.index
│
├── cache/
├── logs/
├── plugins/
└── templates/
- No internet calls (unless user enables web scraping)
- All data stored locally
- User-controlled memory wipe
- Password-protected profile
- Hardware-bound encryption option
- lazy model loading
- tensor caching
- quantized GGUF
- streaming inference
- async Rust runtime
- CPU/GPU configurable load
- auto-sleep mode
- mobile companion app
- face recognition & tagging
- full browser automation
- plugin marketplace
- smart scheduler
- multimodal memory graphs
- multi-agent architecture