Awesome Local AI

This is an awesome collection of open source, local AI tools and solutions.

Your contributions are always welcome!

Inference Engine

Repository	Description	Supported model formats	CPU/GPU Support	UI	language	Platform Type
llama.cpp	- Inference of LLaMA model in pure C/C++	GGML/GGUF	Both	❌	C/C++	Text-Gen
ollama	- CLI and local server. Uses Llamacpp	Both	Both	❌	Text-Gen
koboldcpp	- A simple one-file way to run various GGML models with KoboldAI's UI	GGML	Both	✅	C/C++	Text-Gen
LoLLMS	- Lord of Large Language Models Web User Interface.	Nearly ALL	Both	✅	Python	Text-Gen
ExLlama	- A more memory-efficient rewrite of the HF transformers implementation of Llama	AutoGPTQ/GPTQ	GPU	✅	Python/C++	Text-Gen
vLLM	- vLLM is a fast and easy-to-use library for LLM inference and serving.	GGML/GGUF	Both	❌	Python	Text-Gen
CTransformers	- Python bindings for the Transformer models implemented in C/C++ using GGML library	GGML/GPTQ	Both	❌	C/C++	Text-Gen
llama-cpp-python	- Python bindings for llama.cpp	GGUF	Both	❌	Python	Text-Gen
llama2.rs	- A fast llama2 decoder in pure Rust	GPTQ	CPU	❌	Rust	Text-Gen
ExLlamaV2	- A fast inference library for running LLMs locally on modern consumer-class GPUs	GPTQ/EXL2	GPU	❌	Python/C++	Text-Gen

Inference UI

Jan - Self-hosted, local, AI Inference Platform that scales from personal use to production deployments for a team.
oobabooga - A Gradio web UI for Large Language Models
LM Studio - Discover, download, and run local LLMs.
LocalAI - LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing.
FireworksAI - Experience the world's fastest LLM inference platform deploy your own at no additional cost.
faradav - Chat with AI Characters Offline, Runs locally, Zero-configuration.
GPT4All - A free-to-use, locally running, privacy-aware chatbot
LLMFarm - llama and other large language models on iOS and MacOS offline using GGML library.
LlamaChat - LlamaChat allows you to chat with LLaMa, Alpaca and GPT4All models1 all running locally on your Mac.
LLM as a Chatbot Service - LLM as a Chatbot Service
FuLLMetalAi - Fullmetal.Ai is a distributed network of self-hosted Large Language Models (LLMs)
Automatic1111 - Stable Diffusion web UI
ComfyUI - A powerful and modular stable diffusion GUI with a graph/nodes interface.
petals - Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Platforms / full solutions

H2OAI - H2OGPT The fastest, most accurate AI Cloud Platform
BentoML - BentoML is a framework for building reliable, scalable, and cost-efficient AI applications.

Developer tools

Pinecone - Long-Term Memory for AI
PoplarML - PoplarML enables the deployment of production-ready, scalable ML systems with minimal engineering effort.
Datature - The All-in-One Platform to Build and Deploy Vision AI
One AI - MAKING GENERATIVE AI BUSINESS-READY
Gooey.AI - Create Your Own No Code AI Workflows
Mixo.io - AI website builder
Safurai - AI Code Assistant that saves you time in changing, optimizing, and searching code.
GitFluence - The AI-driven solution that helps you quickly find the right command. Get started with Git Command Generator today and save time.
Haystack - A framework for building NLP applications (e.g. agents, semantic search, question-answering) with language models.
LangChain - A framework for developing applications powered by language models.
gpt4all - A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.
LMQL - LMQL is a query language for large language models.
LlamaIndex - A data framework for building LLM applications over external data.
Phoenix - Open-source tool for ML observability that runs in your notebook environment, by Arize. Monitor and fine tune LLM, CV and tabular models.
trypromptly - Create AI Apps & Chatbots in Minutes
BentoML - BentoML is the platform for software engineers to build AI products.

Agents

SuperAGI - Opensource AGI Infrastructure
Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous.
BabyAGI - Baby AGI is an autonomous AI agent developed using Python that operates through OpenAI and Pinecone APIs.
AgentGPT -Assemble, configure, and deploy autonomous AI Agents in your browser.
HyperWrite - HyperWrite helps you work smarter, faster, and with ease.
AI Agents - AI Agent that Power Up Your Productivity
AgentRunner.ai - Leverage the power of GPT-4 to create and train fully autonomous AI agents.
GPT Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
GPT Prompt Engineer - Automated prompt engineering. It generates, tests, and ranks prompts to find the best ones.
MetaGPT - The Multi-Agent Framework: Given one line requirement, return PRD, design, tasks, repo.

Training

FastChat - An open platform for training, serving, and evaluating large language models.
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
BMTrain - Efficient Training for Big Models.
Alpa - Alpa is a system for training and serving large-scale neural networks.
Megatron-LM - Ongoing research training transformer models at scale

LLM Leaderboard

Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released.
Chatbot Arena Leaderboard - a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
AlpacaEval Leaderboard - An Automatic Evaluator for Instruction-following Language Models
LLM-Leaderboard-streamlit - A joint community effort to create one central leaderboard for LLMs.
lmsys.org - Benchmarking LLMs in the Wild with Elo Ratings

Research

Attention Is All You Need (2017): Presents the original transformer model. it helps with sequence-to-sequence tasks, such as machine translation. [Paper]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018): Helps with language modeling and prediction tasks. [Paper]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022): Mechanism to improve transformers. [paper]
Improving Language Understanding by Generative Pre-Training (2019): Paper is authored by OpenAI on GPT. [paper]
Cramming: Training a Language Model on a Single GPU in One Day (2022): Paper focus on a way too increase the performance by using minimum computing power. [paper]
LaMDA: Language Models for Dialog Applications (2022): LaMDA is a family of Transformer-based neural language models by Google. [paper]
Training language models to follow instructions with human feedback (2022): Use human feedback to align LLMs. [paper]
TurboTransformers: An Efficient GPU Serving System For Transformer Models (PPoPP'21) [paper]
Fast Distributed Inference Serving for Large Language Models (arXiv'23) [paper]
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (arXiv'23) [paper]
Accelerating LLM Inference with Staged Speculative Decoding (arXiv'23) [paper]
ZeRO: Memory optimizations Toward Training Trillion Parameter Models (SC'20) [paper]
TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition 2023 [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Local AI

Inference Engine

Inference UI

Platforms / full solutions

Developer tools

Agents

Training

LLM Leaderboard

Research

Community

About

Releases

Packages

guspan-tanadi/awesome-local-aifromjanhq

Folders and files

Latest commit

History

Repository files navigation

Awesome Local AI

Inference Engine

Inference UI

Platforms / full solutions

Developer tools

Agents

Training

LLM Leaderboard

Research

Community

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages