Mobile RAG Engine

Build local, on-device RAG in Flutter with a Dart package.

Mobile RAG Engine is a Flutter package for local Retrieval-Augmented Generation (RAG): ingest local documents, chunk and embed them on-device, then run hybrid semantic + keyword search through a Dart API. No server, no API cost, no network round-trip for retrieval.

Use it when you need a Flutter local RAG engine for private notes, document Q&A, chat with PDF, offline assistants, or enterprise apps where user data must stay on the device.

Why this package?

No Rust Installation Required

You do NOT need to install Rust, Cargo, or Android NDK.

This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.

Performance

Feature	Pure Dart	Mobile RAG Engine (Rust)
Tokenization	Slow	HuggingFace `tokenizers` (Rust)
Vector Search	O(n)	HNSW Index — sub-linear retrieval
Memory Usage	High	Copy-minimized Rust core, `Float32List` zero-copy transport

Numbers vary by device and corpus. See benchmark_service and the 0.18.0 retrieval-hot-path notes in CHANGELOG.md for measured deltas on your own hardware.

Supported and Verified Scope

Area	Current status	Evidence / boundary
Local Flutter RAG retrieval	Supported	Dart facade over a Rust core for ingest, chunking, embedding, SQLite storage, HNSW vector search, BM25 keyword search, RRF fusion, and context assembly
Offline / on-device operation	Supported	Models and user documents stay local after you bundle the ONNX model and tokenizer assets
Hybrid source retrieval	Verified on benchmark fixtures	80-source balanced profile run: `source_recall@10 = 1.000` for shipped `default_hybrid`
Passage/context retrieval	Verified on benchmark fixtures	80-query passage run: `passage_recall@10 = 0.925`, `answerable_context@10 = 0.938`; semantic passage misses remain the main improvement area
Text-layer PDF-to-RAG	Verified on sample scope	`sample_eng.pdf` and `sample_kor.pdf` profile run: 8/8 PDF-derived queries reached source, passage, and answerable context at top-10
Scanned/image-only PDFs	Detected, not OCR-processed	OCR-required PDFs are surfaced as extraction errors so your app can route to an OCR layer; OCR is not bundled in this package
Large, table-heavy, OCR-heavy PDFs	Still being validated	Do not treat the PDF smoke result as broad PDF robustness or mobile latency/memory proof

For the implementation-oriented guide, see Flutter Local RAG Engine Guide.

100% Offline & Private

Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).

Features

End-to-End RAG Pipeline

One package, complete pipeline. From any document format to LLM-ready context.

Key Features

Category	Features
Document Input	Text-layer PDF, Markdown, Plain Text, and beta DOCX support; file-path and UTF-8 ingest fast paths
Chunking	Plain-text paragraph/line chunking with heading-aware split and tokenizer hard guard; Markdown structure-aware chunking with header-path metadata
Search	HNSW vector + BM25 keyword hybrid search with RRF fusion; metadata-first search with explicit context/chunk hydration
Storage	SQLite persistence, HNSW Index persistence (fast startup), connection pooling, resumable indexing
Collections	Collection-scoped ingest/search/rebuild via `inCollection('id')`
Performance	Rust core, 10x faster tokenization, thread control, memory optimized
Context	Engine-tokenizer exact context budget, adjacent chunk expansion, single source mode

Support boundaries: text-layer PDFs are production-ready. Scanned or image-only PDFs should be routed through an OCR layer before indexing. DOCX extraction is available for early adopters, but complex DOCX layouts such as tables, headers, and footnotes should be treated as beta.

Requirements

Platform	Minimum Version
iOS	16.0+
Android	API 21+ (Android 5.0 Lollipop)
macOS	14.0+

ONNX Runtime is provided through flutter_onnxruntime. CocoaPods iOS builds require static framework linkage (use_frameworks! :linkage => :static), and Android release builds should keep ONNX Runtime classes in ProGuard/R8 rules.

Installation

1. Add the dependency

dependencies:
  mobile_rag_engine: ^0.19.0

2. Download Model Files

# Create assets folder
mkdir -p assets && cd assets

# Download all-MiniLM-L6-v2 model (INT8 quantized for ARM64, ~23MB)
curl -L -o model.onnx "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model_qint8_arm64.onnx"
curl -L -o tokenizer.json "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json"

Need multilingual (Korean, CJK, etc.)? See Model Setup Guide for BGE-m3 and other model options.

Quick Index

Features

Adjacent Chunk Retrieval - Fetch surrounding context.
Index Management - Stats, persistence, and recovery.
Markdown Chunker - Structure-aware text splitting.
Multi-Collection - Isolate ingest/search/rebuild by category.
Prompt Compression - Reduce token usage.
Search by Source - Filter results by document.
Search Strategies - Tune ranking and retrieval.

Guides

Flutter Local RAG Engine Guide - Build local/on-device RAG in Flutter with Dart APIs.
Quick Start - Setup in 5 minutes.
Model Setup - Choosing and downloading models.
Release Build - Bundle size optimization for production.
Troubleshooting - Common fixes.
FAQ - Frequently asked questions.

Testing

Unit Testing - Mocking for isolated tests.

Usage

Minimal Setup

Initialize the engine once in your main() function. See the Quick Start Guide for the full parameter table.

await MobileRag.initialize(
  tokenizerAsset: 'assets/tokenizer.json',
  modelAsset: 'assets/model.onnx',
  deferIndexWarmup: true,
);

// Before first search, wait for BM25/HNSW warmup if you deferred it:
if (!MobileRag.instance.isIndexReady) {
  await MobileRag.instance.warmupFuture;
}

Adding Documents and Searching

class MySearchScreen extends StatelessWidget {
  Future<void> _search() async {
    // 1. Add Documents (auto-chunked & embedded). Indexing is auto-managed
    //    (debounced ~500ms) — only call rebuildIndex() if you need it now.
    await MobileRag.instance.addDocument(
      'Flutter is a UI toolkit for building apps.',
    );

    // File / UTF-8 fast paths are useful for large local documents.
    await MobileRag.instance.addDocumentFromFile('/path/to/manual.pdf');
    final noteBytes = await File('/path/to/notes.md').readAsBytes();
    await MobileRag.instance.addDocumentUtf8(noteBytes, name: 'notes.md');

    // 2. Search with LLM-ready context
    final result = await MobileRag.instance.search(
      'What is Flutter?',
      tokenBudget: 2000,
    );
    print(result.context.text); // Ready to send to LLM
  }
}

Handling File Picker Fallback

addDocumentFromFile is the fastest path because the Rust core reads and chunks the file directly. Some platform pickers (cloud-backed pickers, content URIs without a stable local path, etc.) return data that is not exposed as a real filesystem path. In those cases, fall back to UTF-8 or parsed-text ingestion:

try {
  await MobileRag.instance.addDocumentFromFile(path, name: fileName);
} on RagError {
  final bytes = await File(path).readAsBytes();
  final lower = fileName.toLowerCase();
  if (lower.endsWith('.txt') ||
      lower.endsWith('.md') ||
      lower.endsWith('.markdown')) {
    await MobileRag.instance.addDocumentUtf8(bytes, name: fileName);
  } else {
    try {
      final text = await DocumentParser.parse(bytes);
      await MobileRag.instance.addDocument(text, name: fileName);
    } catch (error) {
      if (DocumentParser.isOcrRequiredPdfExtractionError(error)) {
        throw UnsupportedError(
          DocumentParser.userMessageForExtractionError(error),
        );
      }
      rethrow;
    }
  }
}

Metadata-First Search

Use searchMeta when you want lightweight search metadata first, then explicitly assemble context or hydrate only the chunks you need.

final meta = await MobileRag.instance.searchMeta(
  'What is Flutter?',
  topK: 10,
);

try {
  final context = await MobileRag.instance.assembleContext(
    searchHandle: meta.handle,
    tokenBudget: 2000,
  );

  final chunkIds = meta.hits.map((hit) => hit.chunkId.toInt()).toList();
  final chunks = await MobileRag.instance.hydrateChunks(
    searchHandle: meta.handle,
    chunkIds: chunkIds,
  );
  final excerpts = await MobileRag.instance.getChunkExcerpts(
    searchHandle: meta.handle,
    chunkIds: chunkIds,
    maxBytes: 256,
  );

  print(context.text);
  print('hydrated=${chunks.length}, excerpts=${excerpts.length}');
} finally {
  await meta.handle.dispose();
}

Multi-Collection (v1)

Use collection scopes when you want independent rebuild boundaries per category.

final business = MobileRag.instance.inCollection('business');
final travel = MobileRag.instance.inCollection('travel');

await business.addDocument('Quarterly planning memo...');
await travel.addDocument('Kyoto itinerary...');

if (!travel.isIndexReady) {
  await travel.warmupFuture;
}
final travelHits = await travel.searchHybrid('itinerary');
print(travelHits.length);

If you do not specify a collection, the engine uses the default __default__ collection for backward compatibility.

Advanced Usage: For fine-grained control, use the high-level metadata lane (searchMeta, assembleContext, hydrateChunks, getChunkExcerpts) and the public API reference. Most apps should stay on the MobileRag facade.

Sample App

Check out the example application using this package. This desktop app demonstrates full RAG pipeline integration with an LLM (Gemma 2B) running locally on-device.

mobile-ondevice-rag-desktop

Contributing

Bug reports, feature requests, and PRs are all welcome!

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
.github/workflows		.github/workflows
.vscode		.vscode
android		android
assets/readme-sources		assets/readme-sources
docs		docs
example		example
ios		ios
lib		lib
rust_builder		rust_builder
scripts		scripts
test		test
.gitignore		.gitignore
.metadata		.metadata
.pubignore		.pubignore
CHANGELOG.md		CHANGELOG.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
analysis_options.yaml		analysis_options.yaml
download_models.sh		download_models.sh
flutter_rust_bridge.yaml		flutter_rust_bridge.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mobile RAG Engine

Why this package?

No Rust Installation Required

Performance

Supported and Verified Scope

100% Offline & Private

Features

End-to-End RAG Pipeline

Key Features

Requirements

Installation

1. Add the dependency

2. Download Model Files

Quick Index

Features

Guides

Testing

Usage

Minimal Setup

Adding Documents and Searching

Handling File Picker Fallback

Metadata-First Search

Multi-Collection (v1)

Sample App

Contributing

License

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mobile RAG Engine

Why this package?

No Rust Installation Required

Performance

Supported and Verified Scope

100% Offline & Private

Features

End-to-End RAG Pipeline

Key Features

Requirements

Installation

1. Add the dependency

2. Download Model Files

Quick Index

Features

Guides

Testing

Usage

Minimal Setup

Adding Documents and Searching

Handling File Picker Fallback

Metadata-First Search

Multi-Collection (v1)

Sample App

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages