Intel® AI for Enterprise Agent Toolkit — Open Source Agentic AI Platform on Kubernetes

Deploy a production-grade, open source agentic AI stack on Intel® Xeon® in hours — not months. A batteries-included Kubernetes platform with API gateway, intelligent inference routing, agent memory, sandboxed code execution, MCP tool integration, and full agent and system observability — all pre-integrated and ready to run.

Keywords: agentic AI platform · enterprise AI agents · open source AI agent stack · Kubernetes AI deployment · LLM inference on CPU · Intel Xeon AI · LiteLLM gateway · AI agent memory RAG · on-premise LLM deployment · private AI infrastructure · AI agent observability · MCP server Kubernetes · vLLM sglang Intel · Enterprise Agnet Toolkit

Why This Matters

AI agents are not just smarter chatbots — they are autonomous systems that plan, act, and iterate across multi-step tasks. Building a production-ready agentic AI platform from scratch requires solving a dozen infrastructure problems simultaneously: secure API access, LLM inference routing, agent memory and state management, sandboxed code execution, LLM observability, and enterprise governance.

Most organizations waste months stitching together LiteLLM, Redis, pgvector, Kubernetes, Langfuse, and agent sandboxes. This open source toolkit delivers them pre-integrated, pre-optimized for Intel® Xeon®, and deployable in a single automated workflow.

The core insight: GPUs accelerate reasoning. But the value of agentic AI comes from end-to-end task execution — and that control plane (state, tools, policy, coordination) runs efficiently on CPUs. This toolkit is designed around that reality, making it the ideal platform for cost-efficient, on-premise, and private AI deployments.

Business Value at a Glance

Problem	What the Toolkit Delivers
Long time-to-production	One-click Kubernetes deployment — base stack live in hours
Integration complexity	All layers pre-integrated: gateway, memory, sandbox, observability
Vendor lock-in risk	Provider Design Pattern — swap or extend any component without re-architecting
Governance & compliance gaps	Policy-driven API gateway, sandboxed execution, full audit trail
GPU cost dependency	CPU-efficient LLM inference by default on Xeon; GPU addable on demand
Observability blind spots	Every token, latency, and agent step captured and queryable
Adoption friction	Works with LangGraph, CrewAI, Flowise, Microsoft Agent Framework, and any OpenAI-compatible framework

Architecture

The toolkit is organized into seven pluggable provider blocks, each with a default implementation and a defined interface. Customers can use the defaults on day one and swap or extend individual providers as their needs evolve — without touching any other layer.

Provider Design Pattern

Each functional block in the toolkit follows a Provider Design Pattern: the block defines the capability contract (what it does), not the implementation (how it does it). The default provider is Intel-validated and optimized for Xeon out of the box. Any standard-compliant alternative can be substituted.

This means:

No big-bang replacements. Swap one provider at a time.
No lock-in. The toolkit never mandates a proprietary runtime.
Incremental adoption. Start with all defaults; introduce your own components over time.

Provider Blocks

Each functional block ships with an Intel-validated default. Because the toolkit is built on the Provider Design Pattern, any block can be swapped or extended independently — without touching the rest of the stack.

API Gateway & Access Control

Unified, secure entry point for all agents, tools, and applications. Routes every request through a policy-driven layer that enforces authentication, authorization, and rate limiting before anything reaches a model or tool. Supports any OpenAI-compatible backend — cloud, on-premise, or hybrid.

Default provider: LiteLLM

Intelligent Inference Routing

Automatically dispatches LLM inference workloads to the optimal compute target based on task intent. Reasoning and planning tasks are routed to GPU or cloud/MaaS endpoints; execution, encoding, and coordination tasks are routed to CPU on Xeon. Rules are configurable per application and model-agnostic — works with OpenAI, Anthropic Claude, Google Gemini, and any self-hosted model.

Default provider: LiteLLM Semantic Router · Intel Native Inferencing Service (vLLM, SGLang)

Actions & Tools

Gives agents the ability to take actions in the world — call APIs, run code, query databases, and invoke external services — through a governed and extensible tool layer. MCP (Model Context Protocol) server templates let teams register domain-specific tools without modifying the core stack.

Default provider: Intel MCP (Model Context Protocol) · ETL / Ingestion Services

Code Execution & Sandbox

Enables agents to safely execute code and run arbitrary actions in isolated environments, preventing runaway agents from affecting production systems. Each sandbox is a fully self-contained ephemeral Kubernetes pod with its own filesystem and process tree. Policy controls limit the blast radius of any single agent action.

Default provider: kubernetes-sigs/agent-sandbox

Memory, State & Context

Maintains agent memory across the full lifecycle — short-term working context within a session and long-term knowledge across sessions and users. Enables RAG (Retrieval-Augmented Generation), semantic search over past interactions, and stateful multi-turn workflows. Agents maintain context across tasks, sessions, and workflows.

Default provider: Redis + RediSearch (short-term) · PostgreSQL + pgvector (long-term)

Observability & Telemetry

Every token, latency measurement, and agent step is captured, stored, and queryable. Provides LLM-native tracing for debugging agent reasoning chains, out-of-the-box dashboards, and seamless integration with existing enterprise monitoring infrastructure — no custom instrumentation required.

Default provider: Langfuse · Prometheus · Grafana · Loki

Deployment & Orchestration

Manages the full lifecycle of the stack — provisioning, scaling, updates, and teardown — across single-node and multi-node environments. Supports both production Kubernetes clusters and lightweight Docker Compose for development and demos, using the same configuration.

Default provider: Kubernetes · Helm · Docker Compose

Infrastructure Core

All provider blocks run on a common infrastructure foundation:

Layer	Details
Operating System	Ubuntu 24.04 LTS / RHEL
CPU Configuration	Intel® Xeon® — NUMA tuning, firmware, drivers, AMX enabled
Compute & Runtime	Intel® Xeon® · Intel® AI Accelerators · Intel® Arc™
Supported Model APIs	OpenAI · Anthropic Claude · Google Gemini · SambaNova · local models via vLLM / SGLang

Validated Orchestration Frameworks

Framework	Style
LangGraph	State-machine, multi-step agent workflows
Flowise	Low-code visual pipeline builder
CrewAI	Role-based multi-agent collaboration
OpenClaw	Intel-optimized agentic framework

Example Business Use Cases (Not pre-bundled)

Use Case	Description
Coding Agent	Automated code generation, review, test, and fix loops — accelerates developer productivity
Deep Research Agent	Multi-step web and knowledge base research with citation; replaces hours of analyst work
Flowise Agents	Low-code visual agent pipeline construction for business teams
Customer Support Automation	RAG-augmented, multi-turn resolution with memory — reduces ticket handle time
Enterprise Data Analyst	Natural language to SQL/report across internal databases — self-serve analytics
IT Operations Agent	Infrastructure monitoring, alerting, and self-healing workflows — reduces MTTR

Deployment Model

Deployment Options

Option	Stack	Best For
Multi-node Kubernetes	Kubespray + Helm + KubeRay (for Parallel Tool Calling)	Production, HA, scale-out
Single-node Kubernetes	K8s + Helm	Pilot, staging, edge
Docker Compose	Docker Compose	Development, demos, single machine

How to Deploy

1. Clone the repository

git clone https://github.com/intel/enterprise-agent-toolkit.git
cd enterprise-agent-toolkit

2. Complete prerequisites

Hardware requirements, SSH key setup, and DNS/TLS configuration are covered in Prerequisites Guide.

3. Quick Start

The full step-by-step deployment guide — covering the base stack, semantic routing, Redis, pgvector, and Agent Sandbox — is in Quick Start Guide.

Estimated time to first running agent: hours, not months.

Backend Connection Reference

In-cluster URLs for all shared backends (Redis, Agent Sandbox, pgvector, KubeRay, Flowise). See Backend Connection Reference for the full reference.

Decommission / Reset and Redeploy from Scratch

To tear down the cluster and redeploy from scratch, see Decommission and Redeploy Guide.

Build vs. Build-from-Scratch

Dimension	Build from Scratch	Intel® Toolkit
Time to production	3–6 months	Hours to days
Vendor lock-in	None	None (Apache 2.0)
Component flexibility	Full (you build it)	Full (Provider Pattern)
GPU dependency	Your choice	Optional
Data residency	Full control	Full control
Enterprise governance	You build it	Built-in
Intel® Xeon® optimization	Manual	Native
Total cost of ownership	High	Low (open source + Xeon)

Frequently Asked Questions

What is the Intel® AI for Enterprise Agent Toolkit? An open source, Kubernetes-based platform that gives enterprises everything they need to deploy, run, and govern production AI agents on Intel® Xeon® — including LLM gateway, inference routing, agent memory, sandboxed execution, MCP tool integration, and full observability.

How is this different from LangChain, LangGraph, or CrewAI? LangGraph, CrewAI, and similar frameworks handle agent orchestration logic. This toolkit provides the production infrastructure underneath them — the API gateway, inference routing, memory store, sandbox, and observability layer. You use LangGraph or CrewAI on top of this toolkit.

Does it require a GPU? No. The toolkit is designed to run AI agent workloads efficiently on Intel® Xeon® CPUs using AMX acceleration. GPU or cloud LLM endpoints (OpenAI, Anthropic, Google) can be added to the same gateway for reasoning-heavy tasks.

Can I use it with OpenAI, Anthropic Claude, or Google Gemini? Yes. The LiteLLM API gateway is fully OpenAI-compatible and supports all major model providers — OpenAI, Anthropic Claude, Google Gemini, SambaNova, and any self-hosted model via vLLM or SGLang.

Is this suitable for on-premise or air-gapped deployments? Yes. The full stack runs on bare metal or private cloud Intel® Xeon® servers with no external dependencies required. Ideal for regulated industries requiring data residency and private AI deployments.

Can I replace individual components? Yes — this is the core design principle. The toolkit is built on the Provider Design Pattern: each block (gateway, memory, observability, etc.) has a default but can be swapped independently without touching other layers.

What agent frameworks does it support? LangGraph, CrewAI, Flowwise, OpenClaw, and any Python-based or OpenAI-compatible orchestration framework.

How do I get started? Clone the repo, run ./deploy-agentic-stack.sh, and follow the Quick Start Guide.

License: Apache 2.0 · Primary languages: Shell (88%), Jinja (10%), Go Template (2%)

Links

Intel®, Intel® Xeon®, and Intel® Arc™ are registered trademarks of Intel Corporation or its subsidiaries. Licensed under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
core		core
docs		docs
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
deploy-agentic-stack.sh		deploy-agentic-stack.sh

Folders and files

Latest commit

History

Repository files navigation

Intel® AI for Enterprise Agent Toolkit — Open Source Agentic AI Platform on Kubernetes

Table of Contents

Why This Matters

Business Value at a Glance

Architecture

Provider Design Pattern

Provider Blocks

API Gateway & Access Control

Intelligent Inference Routing

Actions & Tools

Code Execution & Sandbox

Memory, State & Context

Observability & Telemetry

Deployment & Orchestration

Infrastructure Core

Validated Orchestration Frameworks

Example Business Use Cases (Not pre-bundled)

Deployment Model

Deployment Options

How to Deploy

1. Clone the repository

2. Complete prerequisites

3. Quick Start

Backend Connection Reference

Decommission / Reset and Redeploy from Scratch

Build vs. Build-from-Scratch

Frequently Asked Questions

Links

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages