This is the code repository for Context Engineering for Multi-Agent Systems, First Edition, published by Packt.
Last updated: November 18, 2025.
See the Changelog for updates, fixes, and upgrades.
LLM API update: Specific notebooks have been upgraded to leverage GPT-5.1 and the latest OpenAI library standards for improved performance and reasoning latency when necessary. This update also includes fixes for the Moderation API to handle structured agent outputs robustly. For specific details on the affected notebooks and a full list of changes, please consult the Changelog.
Move beyond prompting to build a Context Engine, a transparent architecture of context and reasoning
Denis Rothman

βββββ
βββββ
βββββ
Generative AI is powerful, yet often unpredictable. This guide shows you how to turn that unpredictability into reliability by thinking beyond prompts and approaching AI like an architect. At its core is the Context Engine, a glass-box, multi-agent system youβll learn to design, strengthen, and apply across real-world scenarios. Written by an AI guru and author of various cutting-edge AI books, this book takes you on a hands-on journey from the foundations of context design to building a fully operational Context Engine. Instead of relying on brittle prompts that give only simple instructions, youβll begin with semantic blueprints that map goals and roles with precision, then orchestrate specialized agents using the Model Context Protocol (MCP). As the engine evolves, youβll integrate memory and high-fidelity retrieval with citations, implement safeguards against data poisoning and prompt injection, and enforce moderation to keep outputs aligned with policy. Youβll also harden the system into a resilient architecture, then see it pivot seamlessly across domains, from legal compliance to strategic marketing, proving its domain independence. By the end of this book, youβll be equipped with the skills needed to engineer an adaptable, verifiable architecture you can repurpose across domains and deploy with confidence.
- Develop memory models to retain short-term and cross-session context
- Craft semantic blueprints and drive multi-agent orchestration with MCP
- Implement high-fidelity RAG pipelines with verifiable citations
- Apply safeguards against prompt injection and data poisoning
- Enforce moderation and policy-driven control in AI workflows
- Repurpose the Context Engine across legal, marketing, and beyond
- Deploy a scalable, observable Context Engine in production
Before running the code, ensure your development environment is properly set up. All hands-on chapters use reproducible Python-based environments, tested in Google Colab and VS Code.
A Note on Latency: The Context Engine built in this book and repository performs complex, multi-step reasoning, not simple, single-shot answers. The delay you observe in Colab is the "thinking" time, as the engine dynamically plans and executes a sequence of API calls (e.g., planning, then RAG, then generation). This is the same reason advanced platforms like Gemini or ChatGPT require a moment to "think" for complex requests, even though they benefit from significantly more powerful environments.
- Python: Version 3.10+
- Environment Options:
- Google Colab or
- Local Python environment with:
openaipinecone-clienttiktokentenacityfastapi
Get up and running using cloud-based virtual machines using the Google Colab links provided for each notebook.
No local installation is required.
Before running the notebooks, you will need valid API keys for the underlying services:
- OpenAI: Sign up and generate a key at platform.openai.com.
- Pinecone: Sign up and generate a free API key at pinecone.io.
Click the badges below to launch the notebooks directly in a pre-configured Google Colab VM. You will be asked to add your API keys to the Colab Secrets Manager upon launch.
| Chapter | Notebook | Launch |
|---|---|---|
| Chapter 4 | Context Engine | |
| Chapter X | Another Notebook |
Create a GitHub or local workspace containing at least:
helpers.pyagents.pyregistry.pyengine.py- Notebook files for each chapter
- OpenAI β model access and moderation
- Pinecone β vector database storage and retrieval
- (Optional) Google Cloud or AWS β for deployment sections in Chapter 10
| Requirement | Minimum | Recommended |
|---|---|---|
| CPU | Dual-core | Any modern multi-core |
| RAM | 8 GB | 16 GB or Google Colab Pro |
| GPU | Optional, but helpful for embeddings and token-heavy operations |
Note: From Chapter 5 onward, modular components depend on earlier notebooks. Ensure your environment is configured correctly, as setup steps may not be repeated in later chapters.
- Local execution may incur token and API costs with large contexts.
- The Summarizer Agent (Chapter 6) helps reduce token usage.
- Familiarity with RAG workflows and MCP-based agent orchestration is recommended.
- Refer to Appendix: Context Engine Reference Guide for quick lookup of component structures and explanations.
Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.