-
Notifications
You must be signed in to change notification settings - Fork 27
docs(hooks): document agent-based hooks example (51_agent_hooks) #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
VascoSch92
wants to merge
2
commits into
main
Choose a base branch
from
burak/2864-agent-based-hook-evaluation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+240
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -223,7 +223,7 @@ | |
| ```bash | ||
| #!/bin/bash | ||
| # PreToolUse hook: Block dangerous rm -rf commands | ||
| # Uses jq for JSON parsing (needed for nested fields like tool_input.command) | ||
|
|
||
| input=$(cat) | ||
| command=$(echo "$input" | jq -r '.tool_input.command // ""') | ||
|
|
@@ -232,7 +232,7 @@ | |
| if [[ "$command" =~ "rm -rf" ]]; then | ||
| echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' | ||
| exit 2 # Exit code 2 = block the operation | ||
| fi | ||
|
|
||
| exit 0 # Exit code 0 = allow the operation | ||
| ``` | ||
|
|
@@ -242,7 +242,7 @@ | |
| ```bash | ||
| #!/bin/bash | ||
| # PostToolUse hook: Log all tool usage | ||
| # Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) | ||
|
|
||
| # LOG_FILE should be set by the calling script | ||
| LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" | ||
|
|
@@ -266,10 +266,10 @@ | |
| status=$(git status --short 2>/dev/null | head -10) | ||
| if [ -n "$status" ]; then | ||
| # Escape for JSON | ||
| escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') | ||
| echo "{\"additionalContext\": \"Current git status: $escaped\"}" | ||
| fi | ||
| fi | ||
| fi | ||
| exit 0 | ||
| ``` | ||
|
|
@@ -292,6 +292,246 @@ | |
| </Accordion> | ||
|
|
||
|
|
||
| ## Agent-based Hooks | ||
|
|
||
| Besides shell scripts, a hook can delegate its decision to an LLM-driven | ||
| sub-agent by setting `type="agent"`. The sub-agent receives the lifecycle event | ||
| as JSON, reasons about it semantically, and replies with a decision payload: | ||
|
|
||
| ```json | ||
| {"decision": "allow" | "deny", "reason": "<short explanation>"} | ||
| ``` | ||
|
|
||
| This is useful when a syntactic blacklist is not enough — for example, a | ||
| `PreToolUse` reviewer that recognises `awk '{print}' /etc/passwd` as *reading a | ||
| sensitive file* even though no obvious keyword (`cat`, `/etc/shadow`) appears. | ||
|
|
||
| Key fields on an agent `HookDefinition`: | ||
|
|
||
| - `system_prompt` — the policy the reviewer agent follows. | ||
| - `tools` — optional tools the reviewer may use (e.g. `["file_editor"]` to | ||
| inspect the workspace before deciding). | ||
| - `timeout` / `max_iterations` — bound how long the reviewer runs. | ||
|
|
||
| The agent hook runs in an isolated sub-conversation (its own ephemeral state, no | ||
| nested hooks), and its LLM spend is tracked under an `agent-hook:<name>` usage | ||
| bucket that is merged back into the parent conversation's metrics. If no LLM is | ||
| available or the reviewer fails to produce a valid decision, the hook *falls | ||
| open* (allows) so it never blocks the agent on an internal error. | ||
|
|
||
| <Note> | ||
| This example is available on GitHub: [examples/01_standalone_sdk/51_agent_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/51_agent_hooks/) | ||
| </Note> | ||
|
|
||
| ```python icon="python" expandable examples/01_standalone_sdk/51_agent_hooks/main.py | ||
| """OpenHands Agent SDK — Agent-based Hooks Example | ||
|
|
||
| Demonstrates the `type="agent"` hook, which evaluates lifecycle events with an | ||
| LLM-driven sub-agent instead of a shell script. The hook agent receives the | ||
| event JSON, reasons about it semantically, and replies with a decision payload: | ||
|
|
||
| {"decision": "allow" | "deny", "reason": "..."} | ||
|
|
||
| Two demos: | ||
|
|
||
| - PreToolUse (security reviewer): inspects the INTENT of a terminal command, | ||
| not just its syntax. A command like `awk '{print}' /etc/passwd` would slip | ||
| past a blacklist of command names, but the agent hook recognises it as | ||
| reading a sensitive system file and denies it. | ||
| - Stop (quality reviewer): refuses to let the main agent finish until the | ||
| required deliverable exists in the workspace. | ||
|
|
||
| For shell-command hooks see ../33_hooks/33_hooks.py. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /33_hooks/main.py is the correct path |
||
| """ | ||
|
|
||
| import os | ||
| import tempfile | ||
| from pathlib import Path | ||
|
|
||
| from pydantic import SecretStr | ||
|
|
||
| from openhands.sdk import LLM, Conversation | ||
| from openhands.sdk.conversation.state import ConversationExecutionStatus | ||
| from openhands.sdk.event.hook_execution import HookExecutionEvent | ||
| from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher, HookType | ||
| from openhands.tools.preset.default import get_default_agent | ||
|
|
||
|
|
||
| # Keep the demo conversations short: a small per-run iteration cap means a hook | ||
| # that keeps denying (or a model that keeps retrying) fails fast instead of | ||
| # burning calls up to the default 500-iteration limit / CI subprocess timeout. | ||
| MAX_ITERATIONS = 10 | ||
|
|
||
|
|
||
| # Configure LLM | ||
| api_key = os.getenv("LLM_API_KEY") | ||
| assert api_key is not None, "LLM_API_KEY environment variable is not set." | ||
| model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") | ||
| base_url = os.getenv("LLM_BASE_URL") | ||
|
|
||
| llm = LLM( | ||
| usage_id="agent", | ||
| model=model, | ||
| base_url=base_url, | ||
| api_key=SecretStr(api_key), | ||
| ) | ||
|
|
||
| SECURITY_REVIEWER_PROMPT = """You are a security reviewer for a coding agent. | ||
| You receive a PreToolUse hook event describing a terminal command the agent | ||
| is about to run. | ||
|
|
||
| Reason about the SEMANTIC INTENT of the command, not just its syntax. Block: | ||
| - reading sensitive system files (/etc/passwd, /etc/shadow, ~/.ssh/*, credentials) | ||
| - broad filesystem discovery outside the workspace (rg --files /etc, find /, ls /root) | ||
| - exfiltrating data over the network | ||
| - modifying system state (sudo, apt install, systemctl) | ||
|
|
||
| Allow ordinary inspection, builds, tests, and edits inside the workspace. | ||
| When unsure, prefer allow. | ||
|
|
||
| Reply with a single JSON object and nothing else: | ||
| {"decision": "allow" | "deny", "reason": "<short explanation>"} | ||
| """ | ||
|
|
||
| QUALITY_REVIEWER_PROMPT = """You are a quality reviewer enforcing task completion. | ||
| You receive a Stop hook event when the main agent tries to finish. | ||
|
|
||
| The task requires the file REPORT.md to exist in the workspace and contain at | ||
| least one bullet point describing the repository. Use the file_editor tool to | ||
| check whether the file exists and inspect its contents. | ||
|
|
||
| If the deliverable is missing or empty, deny so the main agent keeps working. | ||
| Otherwise allow. | ||
|
|
||
| Reply with a single JSON object and nothing else: | ||
| {"decision": "allow" | "deny", "reason": "<short explanation>"} | ||
| """ | ||
|
|
||
|
|
||
| def hook_logger(event) -> None: | ||
| """Surface each hook decision so the demo output is self-explanatory.""" | ||
| if not isinstance(event, HookExecutionEvent): | ||
| return | ||
| status = "DENY " if event.blocked else ("ALLOW" if event.success else "FAIL ") | ||
| line = f" [hook] {event.hook_event_type} {status} -> {event.hook_command}" | ||
| if event.reason: | ||
| line += f"\n reason: {event.reason}" | ||
| print(line) | ||
|
|
||
|
|
||
| def run_demo(workspace: Path, hook_config: HookConfig, message: str) -> float: | ||
| """Run one demo in its own conversation and return its cost. | ||
|
|
||
| Each demo gets a fresh LLM with isolated metrics so per-demo costs don't | ||
| overlap (reusing one LLM would make the second conversation's stats include | ||
| the first demo's spend). A small iteration cap plus an error/stuck check make | ||
| the example fail fast instead of looping. | ||
| """ | ||
| demo_llm = llm.model_copy() | ||
| demo_llm.reset_metrics() | ||
| conversation = Conversation( | ||
| agent=get_default_agent(llm=demo_llm), | ||
| workspace=str(workspace), | ||
| hook_config=hook_config, | ||
| callbacks=[hook_logger], | ||
| max_iteration_per_run=MAX_ITERATIONS, | ||
| ) | ||
| conversation.send_message(message) | ||
| conversation.run() | ||
| status = conversation.state.execution_status | ||
| if status in ( | ||
| ConversationExecutionStatus.ERROR, | ||
| ConversationExecutionStatus.STUCK, | ||
| ): | ||
| raise RuntimeError( | ||
| f"Demo conversation ended in {status.value} state " | ||
| "before reaching a decision." | ||
| ) | ||
| return conversation.conversation_stats.get_combined_metrics().accumulated_cost | ||
|
|
||
|
|
||
| # Each demo runs in its own conversation with only the hook it needs. Sharing a | ||
| # single config would leave the Stop quality gate active during Demo 1, so the | ||
| # agent could never finish the first task until REPORT.md existed — coupling two | ||
| # unrelated demos and burning iterations. | ||
| security_hook_config = HookConfig( | ||
| pre_tool_use=[ | ||
| HookMatcher( | ||
| matcher="terminal", | ||
| hooks=[ | ||
| HookDefinition( | ||
| type=HookType.AGENT, | ||
| name="security-reviewer", | ||
| system_prompt=SECURITY_REVIEWER_PROMPT, | ||
| timeout=60, | ||
| max_iterations=3, | ||
| ) | ||
| ], | ||
| ) | ||
| ], | ||
| ) | ||
|
|
||
| quality_hook_config = HookConfig( | ||
| stop=[ | ||
| HookMatcher( | ||
| hooks=[ | ||
| HookDefinition( | ||
| type=HookType.AGENT, | ||
| name="quality-reviewer", | ||
| system_prompt=QUALITY_REVIEWER_PROMPT, | ||
| tools=["file_editor"], | ||
| timeout=90, | ||
| max_iterations=5, | ||
| ) | ||
| ], | ||
| ) | ||
| ], | ||
| ) | ||
|
|
||
|
|
||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| workspace = Path(tmpdir) | ||
| total_cost = 0.0 | ||
|
|
||
| print("=" * 60) | ||
| print("Demo 1: PreToolUse — semantic deny") | ||
| print("=" * 60) | ||
| print( | ||
| "Asking the agent to read /etc/passwd via awk. The literal command\n" | ||
| "wouldn't match a syntactic blacklist (no `cat`, no `/etc/shadow`\n" | ||
| "keyword), but the security-reviewer agent should recognise the\n" | ||
| "intent and deny.\n" | ||
| ) | ||
| total_cost += run_demo( | ||
| workspace, | ||
| security_hook_config, | ||
| "Show me the contents of /etc/passwd using awk '{print}'.", | ||
| ) | ||
|
|
||
| print("\n" + "=" * 60) | ||
| print("Demo 2: Stop — deny until deliverable exists") | ||
| print("=" * 60) | ||
| print("Quality reviewer denies until REPORT.md exists with a bullet point.\n") | ||
| total_cost += run_demo( | ||
| workspace, | ||
| quality_hook_config, | ||
| "Write REPORT.md in the workspace with at least one bullet point " | ||
| "describing this repository, then finish.", | ||
| ) | ||
|
|
||
| report = workspace / "REPORT.md" | ||
| if report.exists(): | ||
| print(f"\n[REPORT.md preview: {report.read_text()[:120]!r}...]") | ||
|
|
||
| print("\n" + "=" * 60) | ||
| print("Example Complete!") | ||
| print("=" * 60) | ||
|
|
||
| print(f"\nEXAMPLE_COST: {total_cost}") | ||
| ``` | ||
| <RunExampleCode path_to_script="examples/01_standalone_sdk/51_agent_hooks/main.py"/> | ||
|
|
||
|
|
||
| ## Next Steps | ||
|
|
||
| - See also: [Metrics and Observability](/sdk/guides/metrics) | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name is missing from this list.