This lightweight library provides a single C++ & Java API to various LLM frameworks. The project uses CMake presets to support native x86, macOS or aarch64 builds and cross-compilation for Android or linux-aarch64.
graph TD
Prompt["Input Prompt (text or text+image)"]
App["Application (C++ API or JNI)"]
LLMRunner["LLM-Runner API"]
Backend["Selected Backend llama.cpp | onnxruntime-genai | Mediapipe | MNN"]
Inference["Inference with model + config"]
KleidiAI["Arm® KleidiAI™ Acceleration - default on Arm"]
Output["Generated tokens / text"]
Prompt --> App
App --> LLMRunner
LLMRunner --> Backend
Backend --> Inference
Inference --> KleidiAI
KleidiAI --> Output
Typical Flow:
- A prompt (text, or text+image for multimodal backends) is provided to the LLM-Runner library.
- The wrapper selects the configured backend.
- The backend performs inference using the loaded model and configuration.
- If enabled, Arm® KleidiAI™ kernels accelerate key operations on supported Arm CPUs.
- The backend returns generated tokens / text.
- Applications receive the result either through:
- the C++ API, or
- the JNI interface on Android™.
| Source Folder | Purpose |
|---|---|
src/cpp/ |
Core C++ wrapper implementing the LLM-Runner abstraction layer and backend integration. |
src/java/ |
Java/JNI bindings. |
scripts/py/ |
Python utilities for downloading models, test resources, and performing data preparation tasks. |
scripts/cmake/ |
Toolchains and CMake helper scripts for cross-compilation and platform configuration. |
model_configuration_files/ |
Model configuration files used by the build system and runtime. |
resources_downloaded/ |
Default directory where models and example assets are downloaded. |
test/ |
C++/Java unit tests and supporting test resources. |