Python: Shell tool with support for local and Docker by alliscode · Pull Request #5664 · microsoft/agent-framework

alliscode · 2026-05-05T22:08:57Z

This pull request introduces a new built-in tools package for the Microsoft Agent Framework, focusing on a cross-platform local shell tool (LocalShellTool) and its supporting infrastructure. It adds comprehensive documentation, licensing, and a Python package structure to support safe and extensible shell command execution, with future growth in mind.

… package Introduces a safe, cross-OS local shell tool as the first citizen of a new agent-framework-tools workspace package. Supports persistent (default) and stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist, allowlist, approval gating, process-tree kill on timeout, output truncation, and audit hooks. Integrates with existing provider get_shell_tool(func=...) factories via FunctionTool kind='shell'. See docs/decisions/0026-builtin-tools-local-shell.md for the full design. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Codifies what LocalShellTool does and does not defend against, and delegates the security-relevant lifecycle primitive to a battle-tested library instead of hand-rolled per-OS code. Changes: - Adopt psutil for cross-OS process-tree termination (executor + session). Replaces hand-rolled taskkill/killpg with one canonical implementation. - Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH poisoning cannot redirect us to an attacker-supplied binary. - Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail, not a security boundary. - Require acknowledge_unsafe=True to set approval_mode='never_require', making the unsafe path explicitly opt-in with a self-documenting name. - Add tests/test_security.py codifying named CVE-style cases. Defenses we DO claim are asserted; non-defenses (denylist bypasses via backslash insertion, variable expansion, interpreter escape, base64, alternative tools, PowerShell-native verbs) are documented as expected-to-pass tests so residual risk stays visible. - Add Threat Model + Confidence Strategy sections to ADR 0026. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool. Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework. Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell. Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Applies the applicable subset of bug fixes accumulated during the .NET shell-tool PR review (microsoft#5604) to the Python shell tool. A1 - Quote workdir safely in _maybe_reanchor Previously _tool.py used double-quote interpolation when emitting the cd/Set-Location prefix, which expanded $VAR, $(), and backticks in the workdir path. A workdir containing shell metacharacters could trigger arbitrary command execution before the user command ran. Replaced with single-quote escaping helpers _quote_posix and _quote_powershell that emit literal-string forms safe for both hosts. A5/A6 - Consolidate truncation to a single byte-aware helper Extracted a shared truncate_head_tail / truncate_text_head_tail helper in _truncate.py. The new implementation distributes odd caps so head receives floor(cap/2) and tail receives ceil(cap/2) bytes, matching the .NET round-9 fix and ensuring no input bytes are silently dropped on the boundary. _session.py previously truncated by Python str length while the caller passed _max_output_bytes - the unit mismatch is now gone: raw byte buffers go through truncate_head_tail and decoded text goes through truncate_text_head_tail. Unit tests added for the truncate and quote helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tool The shell tool's docstrings and comments contained two patterns that the .NET review pushed back on: - Narrative framing about implementation history ("hard-won", "we sidestep", "design inspiration: ...", competitor framework name-drops in module docstrings). - Overstated security guarantees ("battle-tested", "reasonable for untrusted input", "recommended executor for any agent that runs commands from untrusted input", "destructive commands are blocked", "safe local shell tool", "blocks shell injection"). Rewrites the affected docstrings and comments to describe what the code does in neutral terms. Behaviour is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Ports the .NET ShellEnvironmentProvider as a Python ContextProvider so agents using LocalShellTool or DockerShellTool can be primed with an accurate description of the shell they're talking to (family, version, OS, working directory, and which CLIs are available). The provider runs probes through any ShellExecutor, caches the resulting snapshot, and on every before_run extends the session instructions with a markdown block describing the shell idiom to use. A failed first probe leaves the cache empty so the next call retries (no permanent poisoning). Probe failures from a narrow set of expected error types (ShellCommandError, ShellExecutionError, ShellTimeoutError, and asyncio.TimeoutError from the per-probe timeout) are recorded as None fields in the snapshot. Other exceptions propagate. Tool names are validated against ^[A-Za-z0-9._-]+$ before being interpolated into a probe command. Includes 12 unit tests covering happy path, stderr fallback, timeout handling, expected/unexpected exception paths, malicious tool name rejection, case-insensitive deduplication, retry after failure, concurrent first-callers sharing one probe, and the default and custom formatter paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…anup Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR adds a new first-party Python workspace package, agent-framework-tools, introducing a cross-platform shell execution surface (LocalShellTool) plus a container-sandboxed variant (DockerShellTool) and a context provider (ShellEnvironmentProvider) to help models emit correct shell idioms and discover available CLIs.

Changes:

Add agent-framework-tools package (shell tools, policy/denylist, truncation, process-tree kill, persistent session protocol, environment context provider).
Add unit + integration-gated tests and runnable samples for local and Docker-backed shell execution.
Register the new package in the Python workspace (pyproject + uv lock) and update an existing OpenAI sample to use LocalShellTool.

Show a summary per file

File	Description
python/uv.lock	Adds `agent-framework-tools` as a workspace member and locked editable package entry.
python/pyproject.toml	Registers `agent-framework-tools` as a workspace source dependency.
python/samples/02-agents/providers/openai/client_with_local_shell.py	Updates sample to use `LocalShellTool` instead of a hand-rolled subprocess tool.
python/packages/tools/README.md	Documents installation, modes, safety model, and tool/provider usage.
python/packages/tools/LICENSE	Adds MIT license for the new tools package.
python/packages/tools/pyproject.toml	Defines packaging metadata, deps (incl. psutil), and test/lint/tooling config.
python/packages/tools/agent_framework_tools/init.py	Adds package root and version discovery.
python/packages/tools/agent_framework_tools/py.typed	Marks the package as typed for type checkers.
python/packages/tools/agent_framework_tools/shell/init.py	Exposes the public shell-tool API surface.
python/packages/tools/agent_framework_tools/shell/_types.py	Introduces shared types and core exceptions for shell execution.
python/packages/tools/agent_framework_tools/shell/_truncate.py	Implements head/tail UTF-8 byte-budget truncation helpers.
python/packages/tools/agent_framework_tools/shell/_policy.py	Adds allow/deny policy model and default denylist patterns.
python/packages/tools/agent_framework_tools/shell/_resolve.py	Implements cross-platform shell argv resolution and PowerShell detection.
python/packages/tools/agent_framework_tools/shell/_killtree.py	Adds cross-OS process-tree termination (psutil + fallback).
python/packages/tools/agent_framework_tools/shell/_executor.py	Implements stateless execution via subprocess with timeout + truncation.
python/packages/tools/agent_framework_tools/shell/_executor_base.py	Defines a minimal `ShellExecutor` protocol for pluggable backends.
python/packages/tools/agent_framework_tools/shell/_session.py	Implements persistent shell session using sentinel framing and reader tasks.
python/packages/tools/agent_framework_tools/shell/_tool.py	Adds `LocalShellTool` facade + agent-framework `FunctionTool` wiring.
python/packages/tools/agent_framework_tools/shell/_environment.py	Adds `ShellEnvironmentProvider` to probe and inject shell environment guidance.
python/packages/tools/agent_framework_tools/shell/_docker.py	Adds `DockerShellTool` and argv builders for container-sandboxed execution.
python/packages/tools/samples/init.py	Adds samples package marker.
python/packages/tools/samples/shell_openai_persistent.py	Demonstrates OpenAI usage with an approval loop and persistent local shell.
python/packages/tools/samples/shell_allowlist_stateless.py	Demonstrates a strict allowlist + stateless mode configuration.
python/packages/tools/samples/shell_with_environment_provider.py	Demonstrates using `ShellEnvironmentProvider` with stateless vs persistent shells.
python/packages/tools/tests/init.py	Adds tests package marker.
python/packages/tools/tests/test_shell_truncate_and_quote.py	Tests truncation helpers and quoting helpers.
python/packages/tools/tests/test_shell_environment_provider.py	Tests probing, formatting, caching, and concurrency behavior of environment provider.
python/packages/tools/tests/test_security.py	Adds security regression tests documenting denylist behavior and residual risk.
python/packages/tools/tests/test_policy.py	Tests default policy behavior, allowlist behavior, and custom overrides.
python/packages/tools/tests/test_local_shell_tool.py	Tests local shell tool modes, timeouts, policy, persistence, and concurrency.
python/packages/tools/tests/test_docker_shell_tool.py	Tests Docker argv builders, basic tool behavior, and docker-availability-gated integration tests.

Copilot's findings

Files reviewed: 27/29 changed files
Comments generated: 6

+        if self._interactive_argv and "pwsh" in os.path.basename(self._interactive_argv[0]).lower():
+            return f"Set-Location -LiteralPath {_quote_powershell(self._workdir)}\n{command}"
+        return f"cd -- {_quote_posix(self._workdir)}\n{command}"


+from agent_framework import ContextProvider, SupportsAgentRun
+from agent_framework._sessions import AgentSession, SessionContext


+        # Persistent reader state. The reader tasks append into these
+        # buffers; _run_locked scans forward from a per-call offset.
+        self._stdout_buf = bytearray()
+        self._stderr_buf = bytearray()
+        self._stdout_event = asyncio.Event()


+            if self._container_started:
+                if self._mode == "persistent" and self._session is not None:
+                    await self._session.start()
+                return
+            await self._start_container()
+            self._container_started = True
+            if self._mode == "persistent":
+                argv = build_exec_argv(
+                    binary=self._binary,
+                    container_name=self._container_name,
+                    interactive=True,
+                )
+                self._session = ShellSession(
+                    argv,
+                    workdir=None,  # workdir is set on the container itself
+                    env=None,
+                    max_output_bytes=self._max_output_bytes,
+                )
+                await self._session.start()
+
+    async def close(self) -> None:
+        """Stop the inner shell session and tear down the container."""
+        async with self._get_lifecycle_lock():


+    assert getattr(fn, "additional_properties", {}).get("kind") == SHELL_TOOL_KIND_VALUE or \
+        getattr(fn, "kind", None) == SHELL_TOOL_KIND_VALUE or \
+        SHELL_TOOL_KIND_VALUE in str(getattr(fn, "_kind", ""))


+@pytest.mark.skipif(not is_docker_available(), reason="docker daemon unavailable")
+async def test_docker_persistent_session_preserves_state():
+    async with DockerShellTool(image="alpine:3", network="none") as shell:
+        r1 = await shell.run("export AF_X=hello")
+        assert r1.exit_code == 0
+        r2 = await shell.run("echo $AF_X")
+        assert r2.exit_code == 0
+        assert "hello" in r2.stdout


github-actions

Automated Code Review

Reviewers: 3 | Confidence: 86% | Result: All clear

Reviewed: Security Reliability, Test Coverage, Design Approach

Automated review by alliscode's agents

…_model Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

alliscode and others added 7 commits May 5, 2026 14:50

Copilot AI review requested due to automatic review settings May 5, 2026 22:08

moonbox3 added documentation Improvements or additions to documentation python labels May 5, 2026

github-actions Bot changed the title ~~Shell tool with support for local and Docker~~ Python: Shell tool with support for local and Docker May 5, 2026

Copilot started reviewing on behalf of alliscode May 5, 2026 22:12 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

github-actions Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Shell tool with support for local and Docker#5664

Python: Shell tool with support for local and Docker#5664
alliscode wants to merge 9 commits intomicrosoft:mainfrom
alliscode:feat/shell-docker-executor

alliscode commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from agent_framework import ContextProvider, SupportsAgentRun
		from agent_framework._sessions import AgentSession, SessionContext

Conversation

alliscode commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants