Skip to content

Python: Shell tool with support for local and Docker#5664

Draft
alliscode wants to merge 9 commits intomicrosoft:mainfrom
alliscode:feat/shell-docker-executor
Draft

Python: Shell tool with support for local and Docker#5664
alliscode wants to merge 9 commits intomicrosoft:mainfrom
alliscode:feat/shell-docker-executor

Conversation

@alliscode
Copy link
Copy Markdown
Member

This pull request introduces a new built-in tools package for the Microsoft Agent Framework, focusing on a cross-platform local shell tool (LocalShellTool) and its supporting infrastructure. It adds comprehensive documentation, licensing, and a Python package structure to support safe and extensible shell command execution, with future growth in mind.

alliscode and others added 7 commits May 5, 2026 14:50
… package

Introduces a safe, cross-OS local shell tool as the first citizen of a new

agent-framework-tools workspace package. Supports persistent (default) and

stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist,

allowlist, approval gating, process-tree kill on timeout, output truncation,

and audit hooks. Integrates with existing provider get_shell_tool(func=...)

factories via FunctionTool kind='shell'.

See docs/decisions/0026-builtin-tools-local-shell.md for the full design.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codifies what LocalShellTool does and does not defend against, and

delegates the security-relevant lifecycle primitive to a battle-tested

library instead of hand-rolled per-OS code.

Changes:

- Adopt psutil for cross-OS process-tree termination (executor + session).

  Replaces hand-rolled taskkill/killpg with one canonical implementation.

- Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH

  poisoning cannot redirect us to an attacker-supplied binary.

- Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail,

  not a security boundary.

- Require acknowledge_unsafe=True to set approval_mode='never_require',

  making the unsafe path explicitly opt-in with a self-documenting name.

- Add tests/test_security.py codifying named CVE-style cases. Defenses

  we DO claim are asserted; non-defenses (denylist bypasses via

  backslash insertion, variable expansion, interpreter escape, base64,

  alternative tools, PowerShell-native verbs) are documented as

  expected-to-pass tests so residual risk stays visible.

- Add Threat Model + Confidence Strategy sections to ADR 0026.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool.

Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework.

Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell.

Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Applies the applicable subset of bug fixes accumulated during the
.NET shell-tool PR review (microsoft#5604) to the
Python shell tool.

A1 - Quote workdir safely in _maybe_reanchor

  Previously _tool.py used double-quote interpolation when emitting
  the cd/Set-Location prefix, which expanded $VAR, $(), and backticks
  in the workdir path. A workdir containing shell metacharacters could
  trigger arbitrary command execution before the user command ran.

  Replaced with single-quote escaping helpers _quote_posix and
  _quote_powershell that emit literal-string forms safe for both
  hosts.

A5/A6 - Consolidate truncation to a single byte-aware helper

  Extracted a shared truncate_head_tail / truncate_text_head_tail
  helper in _truncate.py. The new implementation distributes odd
  caps so head receives floor(cap/2) and tail receives ceil(cap/2)
  bytes, matching the .NET round-9 fix and ensuring no input bytes
  are silently dropped on the boundary.

  _session.py previously truncated by Python str length while the
  caller passed _max_output_bytes - the unit mismatch is now gone:
  raw byte buffers go through truncate_head_tail and decoded text
  goes through truncate_text_head_tail.

Unit tests added for the truncate and quote helpers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tool

The shell tool's docstrings and comments contained two patterns that
the .NET review pushed back on:

- Narrative framing about implementation history ("hard-won",
  "we sidestep", "design inspiration: ...", competitor framework
  name-drops in module docstrings).
- Overstated security guarantees ("battle-tested",
  "reasonable for untrusted input", "recommended executor for any
  agent that runs commands from untrusted input",
  "destructive commands are blocked", "safe local shell tool",
  "blocks shell injection").

Rewrites the affected docstrings and comments to describe what the
code does in neutral terms. Behaviour is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ports the .NET ShellEnvironmentProvider as a Python ContextProvider
so agents using LocalShellTool or DockerShellTool can be primed with
an accurate description of the shell they're talking to (family,
version, OS, working directory, and which CLIs are available).

The provider runs probes through any ShellExecutor, caches the
resulting snapshot, and on every before_run extends the session
instructions with a markdown block describing the shell idiom to
use. A failed first probe leaves the cache empty so the next call
retries (no permanent poisoning).

Probe failures from a narrow set of expected error types
(ShellCommandError, ShellExecutionError, ShellTimeoutError, and
asyncio.TimeoutError from the per-probe timeout) are recorded as
None fields in the snapshot. Other exceptions propagate. Tool
names are validated against ^[A-Za-z0-9._-]+$ before being
interpolated into a probe command.

Includes 12 unit tests covering happy path, stderr fallback,
timeout handling, expected/unexpected exception paths, malicious
tool name rejection, case-insensitive deduplication, retry after
failure, concurrent first-callers sharing one probe, and the
default and custom formatter paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…anup

Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 22:08
@moonbox3 moonbox3 added documentation Improvements or additions to documentation python labels May 5, 2026
@github-actions github-actions Bot changed the title Shell tool with support for local and Docker Python: Shell tool with support for local and Docker May 5, 2026
The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new first-party Python workspace package, agent-framework-tools, introducing a cross-platform shell execution surface (LocalShellTool) plus a container-sandboxed variant (DockerShellTool) and a context provider (ShellEnvironmentProvider) to help models emit correct shell idioms and discover available CLIs.

Changes:

  • Add agent-framework-tools package (shell tools, policy/denylist, truncation, process-tree kill, persistent session protocol, environment context provider).
  • Add unit + integration-gated tests and runnable samples for local and Docker-backed shell execution.
  • Register the new package in the Python workspace (pyproject + uv lock) and update an existing OpenAI sample to use LocalShellTool.
Show a summary per file
File Description
python/uv.lock Adds agent-framework-tools as a workspace member and locked editable package entry.
python/pyproject.toml Registers agent-framework-tools as a workspace source dependency.
python/samples/02-agents/providers/openai/client_with_local_shell.py Updates sample to use LocalShellTool instead of a hand-rolled subprocess tool.
python/packages/tools/README.md Documents installation, modes, safety model, and tool/provider usage.
python/packages/tools/LICENSE Adds MIT license for the new tools package.
python/packages/tools/pyproject.toml Defines packaging metadata, deps (incl. psutil), and test/lint/tooling config.
python/packages/tools/agent_framework_tools/init.py Adds package root and version discovery.
python/packages/tools/agent_framework_tools/py.typed Marks the package as typed for type checkers.
python/packages/tools/agent_framework_tools/shell/init.py Exposes the public shell-tool API surface.
python/packages/tools/agent_framework_tools/shell/_types.py Introduces shared types and core exceptions for shell execution.
python/packages/tools/agent_framework_tools/shell/_truncate.py Implements head/tail UTF-8 byte-budget truncation helpers.
python/packages/tools/agent_framework_tools/shell/_policy.py Adds allow/deny policy model and default denylist patterns.
python/packages/tools/agent_framework_tools/shell/_resolve.py Implements cross-platform shell argv resolution and PowerShell detection.
python/packages/tools/agent_framework_tools/shell/_killtree.py Adds cross-OS process-tree termination (psutil + fallback).
python/packages/tools/agent_framework_tools/shell/_executor.py Implements stateless execution via subprocess with timeout + truncation.
python/packages/tools/agent_framework_tools/shell/_executor_base.py Defines a minimal ShellExecutor protocol for pluggable backends.
python/packages/tools/agent_framework_tools/shell/_session.py Implements persistent shell session using sentinel framing and reader tasks.
python/packages/tools/agent_framework_tools/shell/_tool.py Adds LocalShellTool facade + agent-framework FunctionTool wiring.
python/packages/tools/agent_framework_tools/shell/_environment.py Adds ShellEnvironmentProvider to probe and inject shell environment guidance.
python/packages/tools/agent_framework_tools/shell/_docker.py Adds DockerShellTool and argv builders for container-sandboxed execution.
python/packages/tools/samples/init.py Adds samples package marker.
python/packages/tools/samples/shell_openai_persistent.py Demonstrates OpenAI usage with an approval loop and persistent local shell.
python/packages/tools/samples/shell_allowlist_stateless.py Demonstrates a strict allowlist + stateless mode configuration.
python/packages/tools/samples/shell_with_environment_provider.py Demonstrates using ShellEnvironmentProvider with stateless vs persistent shells.
python/packages/tools/tests/init.py Adds tests package marker.
python/packages/tools/tests/test_shell_truncate_and_quote.py Tests truncation helpers and quoting helpers.
python/packages/tools/tests/test_shell_environment_provider.py Tests probing, formatting, caching, and concurrency behavior of environment provider.
python/packages/tools/tests/test_security.py Adds security regression tests documenting denylist behavior and residual risk.
python/packages/tools/tests/test_policy.py Tests default policy behavior, allowlist behavior, and custom overrides.
python/packages/tools/tests/test_local_shell_tool.py Tests local shell tool modes, timeouts, policy, persistence, and concurrency.
python/packages/tools/tests/test_docker_shell_tool.py Tests Docker argv builders, basic tool behavior, and docker-availability-gated integration tests.

Copilot's findings

  • Files reviewed: 27/29 changed files
  • Comments generated: 6

Comment on lines +275 to +277
if self._interactive_argv and "pwsh" in os.path.basename(self._interactive_argv[0]).lower():
return f"Set-Location -LiteralPath {_quote_powershell(self._workdir)}\n{command}"
return f"cd -- {_quote_posix(self._workdir)}\n{command}"
Comment on lines +25 to +26
from agent_framework import ContextProvider, SupportsAgentRun
from agent_framework._sessions import AgentSession, SessionContext
Comment on lines +81 to +85
# Persistent reader state. The reader tasks append into these
# buffers; _run_locked scans forward from a per-call offset.
self._stdout_buf = bytearray()
self._stderr_buf = bytearray()
self._stdout_event = asyncio.Event()
Comment on lines +278 to +300
if self._container_started:
if self._mode == "persistent" and self._session is not None:
await self._session.start()
return
await self._start_container()
self._container_started = True
if self._mode == "persistent":
argv = build_exec_argv(
binary=self._binary,
container_name=self._container_name,
interactive=True,
)
self._session = ShellSession(
argv,
workdir=None, # workdir is set on the container itself
env=None,
max_output_bytes=self._max_output_bytes,
)
await self._session.start()

async def close(self) -> None:
"""Stop the inner shell session and tear down the container."""
async with self._get_lifecycle_lock():
Comment on lines +169 to +171
assert getattr(fn, "additional_properties", {}).get("kind") == SHELL_TOOL_KIND_VALUE or \
getattr(fn, "kind", None) == SHELL_TOOL_KIND_VALUE or \
SHELL_TOOL_KIND_VALUE in str(getattr(fn, "_kind", ""))
Comment on lines +177 to +184
@pytest.mark.skipif(not is_docker_available(), reason="docker daemon unavailable")
async def test_docker_persistent_session_preserves_state():
async with DockerShellTool(image="alpine:3", network="none") as shell:
r1 = await shell.run("export AF_X=hello")
assert r1.exit_code == 0
r2 = await shell.run("echo $AF_X")
assert r2.exit_code == 0
assert "hello" in r2.stdout
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 86% | Result: All clear

Reviewed: Security Reliability, Test Coverage, Design Approach


Automated review by alliscode's agents

…_model

Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants