Skip to content

Ollama Client – Chat with Local LLMs Inside Your Browser A lightweight, privacy-first Ollama Chrome extension to chat with locally hosted Ollama lllm models like LLaMA 2, Mistral, and CodeLLaMA. Supports streaming, stop/regenerate and easy model switching — all without cloud APIs or data leaks.

License

Notifications You must be signed in to change notification settings

Shishir435/ollama-client

🧠 Ollama Client — Chat with Local LLMs in Your Browser

Ollama Client is a powerful, privacy-first Chrome extension that lets you chat with locally hosted LLMs using Ollama — no cloud, no tracking. It’s lightweight, open source, and designed for fast, offline-friendly AI conversations.


Chrome Supported Chromium Supported Brave Supported Edge Supported Opera Supported Firefox Supported

🚀 Get Started — Install Now


❤️ Upvote Us on Product Hunt!

🌐 Explore More


✨ Features

🤖 Model Management

  • 🔌 Local Ollama Integration – Connect to a local Ollama server (no API keys required)
  • 🌐 LAN/Local Network Support – Connect to Ollama servers on your local network using IP addresses (e.g., http://192.168.x.x:11434)
  • 🔄 Model Switcher – Switch between models in real time with a beautiful UI
  • 🔍 Model Search & Pull – Search and pull models directly from Ollama.com in the UI (with progress indicator)
  • 🗑️ Model Deletion – Clean up unused models with confirmation dialogs
  • 🧳 Load/Unload Models – Manage Ollama memory footprint efficiently
  • 📦 Model Version Display – View and compare model versions easily
  • 🎛️ Advanced Parameter Tuning – Per-model configuration: temperature, top_k, top_p, repeat penalty, stop sequences, system prompts

💬 Chat & Conversations

  • 💬 Beautiful Chat UI – Modern, polished interface built with Shadcn UI
  • 🗂️ Multi-Chat Sessions – Create, manage, and switch between multiple chat sessions
  • 📤 Export Chat Sessions – Export single or all chat sessions as PDF or JSON
  • 📥 Import Chat Sessions – Import single or multiple chat sessions from JSON files
  • 📋 Copy & Regenerate – Quickly rerun or copy AI responses
  • Streaming Responses – Real-time streaming with typing indicators

🧠 Embeddings & Semantic Search (Beta v0.3.0)

  • 🔍 Semantic Chat Search – Search chat history by meaning, not just keywords
  • 📊 Vector Database – IndexedDB-based vector storage with optimized cosine similarity
  • 🎯 Smart Chunking – 3 strategies: fixed, semantic, hybrid (configurable)
  • 🚀 Optimized Search – Pre-normalized vectors, caching, early termination
  • 🔧 Configurable – Chunk size, overlap, similarity threshold, search limits
  • 📁 Context-Aware – Search across all chats or within current session

📎 File Upload & Processing (Beta v0.3.0+)

  • 📄 Text Files – Support for .txt, .md and text based files
  • 📁 PDF Support – Extract and process text from PDF documents
  • 📘 DOCX Support – Extract text from Word documents
  • 📊 CSV Support – Parse CSV, TSV, PSV with custom delimiters and column extraction (Beta v0.5.0)
  • 🌐 HTML Support – Convert HTML to Markdown for clean text extraction (Beta v0.5.0) with 50+ language support (Beta v0.5.0)
  • ⚙️ Auto-Embedding – Automatic embedding generation for uploaded files
  • 📊 Progress Tracking – Real-time progress indicators during processing
  • 🎛️ Configurable Limits – User-defined max file size in settings

🌐 Webpage Integration

  • 🧠 Enhanced Content Extraction – Advanced extraction with multiple scroll strategies (none, instant, gradual, smart)
  • 🔄 Lazy Loading Support – Automatically waits for dynamic content to load
  • 📄 Site-Specific Overrides – Configure extraction settings per domain (scroll strategies, delays, timeouts)
  • 🎯 Defuddle Integration – Smart content extraction with Defuddle fallback
  • 📖 Mozilla Readability – Fallback extraction using Mozilla Readability
  • 🎬 YouTube Transcripts – Automated YouTube transcript extraction
  • 📊 Extraction Metrics – View scroll steps, mutations detected, and content length

⚙️ Customization & Settings

  • 🎨 Professional UI – Modern design system with glassmorphism effects, gradients, and smooth animations
  • 🌓 Dark Mode – Beautiful dark theme with smooth transitions
  • 📝 Prompt Templates – Create, manage, and use custom prompt templates (Ctrl+/)
  • 🔊 Advanced Text-to-Speech – Searchable voice selector with adjustable speech rate & pitch
  • 🌍 Internationalization (i18n) – Full multi-language support with 9 languages: English, Hindi, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Russian
  • 🎚️ Cross-Browser Compatibility – Works with Chrome, Brave, Edge, Opera, Firefox
  • 🧪 Voice Testing – Test voices before using them

🔒 Privacy & Performance

  • 🛡️ 100% Local and Private – All storage and inference happen on your device
  • 🧯 Declarative Net Request (DNR) – Automatic CORS handling
  • 💾 IndexedDB Storage – Efficient local storage for chat sessions
  • Performance Optimized – Lazy loading, debounced operations, optimized re-renders
  • 🔄 State Management – Clean Zustand-based state management

🧩 Tech Stack

Frontend

Backend & APIs

Content Processing

Developer Tools


🛠️ Quick Setup

✅ 1. Install the Extension

👉 Chrome Web Store

✅ 2. Install Ollama on Your Machine

brew install ollama  # macOS
# or visit https://ollama.com for Windows/Linux installers

ollama serve         # starts at http://localhost:11434

💡 Quick Setup Script (Cross-platform):

For easier setup with LAN access and Firefox CORS support:

# Cross-platform bash script (macOS/Linux/Windows with Git Bash)
./tools/ollama-env.sh firefox   # Firefox with CORS + LAN access
./tools/ollama-env.sh chrome    # Chrome with LAN access

📄 Script file: tools/ollama-env.sh

This script automatically:

  • Configures Ollama for LAN access (0.0.0.0)
  • Sets up CORS for Firefox extensions (if needed)
  • Shows your local IP address for network access
  • Detects your OS (macOS, Linux, Windows) automatically
  • Stops any running Ollama instances before starting

If you don't have the script file, you can download it directly or see the full setup guide: Ollama Setup Guide

More info: https://ollama.com

✅ 3. Pull a Model

ollama pull gemma3:1b

Other options: mistral, llama3:8b, codellama, etc.

⚙️ 4. Configure the Extension

  • Click the Ollama Client icon

  • Open ⚙️ Settings

  • Set your:

    • Base URL: http://localhost:11434 (default) or your local network IP (e.g., http://192.168.1.100:11434)
    • Default model (e.g. gemma:2b)
    • Theme & appearance
    • Model parameters
    • Prompt templates

💡 Tip: You can use Ollama on a local network server by entering its IP address (e.g., http://192.168.x.x:11434) in the Base URL field. Make sure Ollama is configured with OLLAMA_HOST=0.0.0.0 for LAN access.

Advanced parameters like system prompts and stop sequences are available per model.


🛠️ Local Development Setup

Want to contribute or customize? You can run and modify the Ollama Client extension locally using Plasmo.

⚙️ Prerequisites

  • Node.js (v18 or newer recommended)
  • pnpm (recommended) or npm
  • Ollama installed locally

📦 1. Clone the Repo

git clone https://github.com/Shishir435/ollama-client.git
cd ollama-client

📥 2. Install Dependencies

Using pnpm (recommended):

pnpm install

Or with npm:

npm install

🧪 3. Run the Extension (Dev Mode)

Start development mode with hot reload:

pnpm dev

Or with npm:

npm run dev

This launches the Plasmo dev server and gives instructions for loading the unpacked extension in Chrome:

  • Open chrome://extensions
  • Enable Developer mode
  • Click Load unpacked
  • Select the dist/ folder generated by Plasmo

🛠 4. Build for Production

pnpm build

⛓️ 5. Package for Production

pnpm package

🧪 6. Run, build and package in Firefox (Experimental)

Setup Ollama for Firefox:

Firefox requires manual CORS configuration. Use the helper script:

# Cross-platform bash script (macOS/Linux/Windows with Git Bash)
./tools/ollama-env.sh firefox

This configures OLLAMA_ORIGINS for Firefox extension support.

Build and run:

pnpm dev --target=firefox
pnpm build --target=firefox
pnpm package --target=firefox

Or with npm:

npm run dev -- --target=firefox

Load as a temporary extension.


📁 Code Structure

src/
├── background/        # Background service worker & API handlers
├── sidepanel/         # Main chat UI
├── options/           # Settings page
├── features/          # Feature modules
│   ├── chat/          # Chat components, hooks, semantic search
│   ├── model/         # Model management & settings
│   ├── sessions/      # Chat session management
│   ├── prompt/        # Prompt templates
│   └── tabs/          # Browser tab integration
├── lib/               # Shared utilities
│   └── embeddings/    # Vector embeddings & semantic search
├── components/        # Shared UI components (Shadcn)
└── hooks/             # Shared React hooks

Architecture: Feature-based organization with separation of concerns (components, hooks, stores). Zustand for global state, React hooks for local state.


✅ Tips

  • Change manifest settings in package.json
  • PRs welcome! Check issues for open tasks

💡 Recommended Models by Device

System Specs Suggested Models
💻 8GB RAM (no GPU) gemma:2b, mistral:7b-q4
💻 16GB RAM (no GPU) gemma:3b-q4, mistral
🎮 16GB+ with GPU (6GB VRAM) llama3:8b-q4, gemma:3b
🔥 RTX 3090+ or Apple M3 Max llama3:70b, mixtral

📦 Prefer quantized models (q4_0, q5_1, etc.) for better performance.

Explore: Ollama Model Library


🧪 Firefox Support

Ollama Client is a Chrome Manifest V3 extension. To use in Firefox:

  1. Go to about:debugging
  2. Click "Load Temporary Add-on"
  3. Select the manifest.json from the extension folder
  4. Manually allow CORS access (see setup guide)

🐛 Known Issues

  • "Stop Pull" during model downloads may glitch
  • Large chat histories in IndexedDB can impact performance

What’s Next (Roadmap)

Here’s what’s coming up next in Ollama Client—grouped by priority:

High Priority

  • Migrate state management to Zustand for cleaner logic and global state control
  • Add Export / Import Chat History (JSON, txt or PDF format)
  • Add Reset App Data button ("Reset All") under Options → Reset (clears IndexedDB + localStorage)
  • Enhanced Content Extraction – Phase 1 implementation with lazy loading support, site-specific overrides, and Defuddle integration
  • Advanced Text-to-Speech – Searchable voice selector with rate/pitch controls and cross-browser compatibility
  • Automated YouTube Transcript Extraction – Automatic button clicking for transcript access
  • GitHub Content Extraction – Special handling for repository and profile pages

Embeddings & Semantic Search

  • Implement Ollama Embedding Models:
    • Integration with Ollama embedding models (e.g., nomic-embed-text, mxbai-embed-large)
    • Generate embeddings for chat messages and store in IndexedDB
    • Semantic search over chat history (global and per-chat)
    • Auto-embedding toggle and backfill functionality
  • Vector Search Optimization (Phase 1 - Completed):
    • Brute-force cosine similarity with optimized computation
    • Pre-normalize embeddings on storage
    • Use Float32Array for better memory locality
    • Implement early termination for low similarity scores
    • Add search result caching (configurable TTL & max size)
    • Non-blocking computation (async chunking with yields)
  • Semantic Chat Search UI (Beta v0.3.0 - Completed):
    • Search dialog with debounced input
    • Search scope toggle (all chats / current chat)
    • Grouped results by session
    • Similarity scores with % match display
    • Click to navigate and highlight message
    • Real-time loading indicators
  • Advanced Vector Search (Phase 2 - Completed):
    • Optimized vector indexing for faster searches
    • Service Worker-compatible implementation
    • Hybrid search strategy (indexed + brute-force fallback)
    • Incremental index updates
    • Expected performance: 5-10x faster than Phase 1 for datasets >1000 vectors
    • WASM upgrade path documented in docs/HNSW_WASM_UPGRADE.md (optional for >50K vectors)
  • Enable Local RAG over chats, PDFs, and uploaded files
  • Browser Search Feature:
    • Contextual search within webpage content
    • Semantic search over extracted content
    • Search result highlighting
    • Search history
  • Optional Web Search Enrichment:
    • Offline-first architecture
    • Opt-in Brave / DuckDuckGo API (user-provided key)
    • WASM fallback (e.g., tinysearch) when no key

Note: Hybrid embeddings with client-side transformers (@xenova/transformers) have been tested and show degraded model response quality compared to direct text prompts. The focus will be on Ollama-hosted embedding models instead.

File Upload & Processing

  • Text File Support (Beta v0.3.0 - Completed):
    • Plain text-based formats: .txt, .md and more.
    • Direct UTF-8 reading
  • PDF Support (Beta v0.3.0 - Completed):
    • Full text extraction via pdf.js
    • Multi-page document support
  • DOCX Support (Beta v0.3.0 - Completed):
    • Extract text from Word documents via mammoth.js
    • Handle formatting and structure
  • Auto-Embedding (Beta v0.3.0 - Completed):
    • Automatic chunking with configurable strategies
    • Background embedding generation via port messaging
    • Progress tracking with real-time updates
    • Batch processing for performance
  • File Upload Settings (Beta v0.3.0 - Completed):
    • Configurable max file size
    • Auto-embed toggle
    • Embedding batch size configuration
  • CSV Support (Beta v0.5.0 - Completed):
    • CSV parsing with d3-dsv
    • Custom delimiter support (comma, tab, pipe, semicolon)
    • Column extraction
    • TSV and PSV file support
  • HTML Support (Beta v0.5.0 - Completed):
    • HTML to Markdown conversion via Turndown
    • Structure and link preservation

UX & Metrics Enhancements

  • Track Per-Session Token Usage and display in chat metadata (duration, token count)
  • Enable Semantic Chat Search / Filter once embeddings are in place
  • Add Export/Import UI Buttons in chat selector ui

🔗 Useful Links


📢 Spread the Word!

If you find Ollama Client helpful, please consider:

  • ⭐ Starring the repo
  • 📝 Leaving a review on the Chrome Web Store
  • 💬 Sharing on socials (tag #OllamaClient)

Built with ❤️ by @Shishir435

About

Ollama Client – Chat with Local LLMs Inside Your Browser A lightweight, privacy-first Ollama Chrome extension to chat with locally hosted Ollama lllm models like LLaMA 2, Mistral, and CodeLLaMA. Supports streaming, stop/regenerate and easy model switching — all without cloud APIs or data leaks.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Languages