fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND by Raman369AI · Pull Request #5683 · google/adk-python

Raman369AI · 2026-05-13T03:47:28Z

Companion to #5651 — closes the same defect class for RunSkillScriptTool.

Summary

RunSkillScriptTool.run_async returns SCRIPT_NOT_FOUND as a structured soft-error string when a script path passed by the LLM does not exist in the skill's scripts/ directory. Because the response is a normal tool result (not an exception or terminal signal), the LLM treats it as transient and retries — and in practice it hallucinates a different plausible script path on each retry (e.g. scripts/setup.py, scripts/build.py, scripts/run.sh). Nothing in RunSkillScriptTool tracks failures across paths, so the loop continues until RunConfig.max_llm_calls (default 500) is exhausted.

This is the same defect mode as LoadSkillResourceTool (#5652 / #5651), in the same file, with the same fix shape. Filing as a separate PR for a clean history.

Why this matters

The four conditions that made the resource-loop reachable apply identically to scripts:

No script manifest at L2 — the load_skill response intentionally omits a list of available scripts (progressive-disclosure spec). The LLM infers script filenames from prose in SKILL.md, and inferred filenames like scripts/setup.py or scripts/build.sh are routinely wrong.
Soft error string — SCRIPT_NOT_FOUND looks transient to the model.
No terminal signal — nothing escalates after the first miss.
max_llm_calls=500 is the only backstop — a global reasoning cap, not a defense against a known failure mode on one tool.

What changed

Two layers in src/google/adk/tools/skill_toolset.py, plus tests:

1. Code: invocation-scoped failure counter in `RunSkillScriptTool.run_async`

A total SCRIPT_NOT_FOUND count (across all paths) is tracked on tool_context.state under:

temp:_adk_skill_script_not_found_count_<invocation_id>

The temp: prefix uses ADK's existing convention so the value is trimmed from the persisted event delta and never reaches durable session storage.
The <invocation_id> suffix ensures correctness on in-memory session backends, where temp: keys are not auto-cleared between invocations.

Behavior:

First SCRIPT_NOT_FOUND within an invocation (any path) → returns SCRIPT_NOT_FOUND (unchanged).
Any subsequent miss in the same invocation → returns the new SCRIPT_NOT_FOUND_FATAL error code with an explicit "do not retry — report the error and stop" message, including the failure number.

The guard is invocation-scoped, path-agnostic (catches hallucinated variants), and adds no overhead on the success path. The counter key is independent of the resource counter introduced in #5651 — a single miss on each tool does not immediately escalate.

2. Prompt: no-retry rule for `run_skill_script`

A new rule 5 in _DEFAULT_SKILL_SYSTEM_INSTRUCTION:

"If run_skill_script returns an error (for example SCRIPT_NOT_FOUND), do not retry the same script or guess a different script path. Report the error to the user and stop."

Why both layers

Defense-in-depth, identical reasoning to #5651: code-only termination produces confusing downstream behavior when the LLM has no semantic reason to stop; prompt-only termination relies on the LLM following the rule, which imperfect upstream prompts can override.

Considered and rejected

Alternative	Why not
Per-path retry guard	LLM hallucinates a different script path on every retry — a per-path list never escalates
Tighten or default-lower `max_llm_calls`	Caps overall reasoning budget; punishes legitimate long-running tasks
Symptomatic `after_tool_callback` workaround	Pushes the fix onto every `SkillToolset` user
Add an `available_scripts` manifest to L2 `load_skill`	Defeats the lazy-loading / token-saving design
Introduce a `list_skill_scripts` tool	Violates the L1→L2→L3 progressive disclosure contract
Share one counter with `load_skill_resource`	Two distinct tools, two distinct hallucination surfaces; separate counters keep the failure semantics aligned per tool

Test plan

New tests in tests/unittests/tools/test_skill_toolset.py:

test_execute_script_repeated_failure_escalates_to_fatal — second call on the same missing path within an invocation returns SCRIPT_NOT_FOUND_FATAL.
test_execute_script_different_path_also_escalates_to_fatal — second call on a different missing path also returns SCRIPT_NOT_FOUND_FATAL; proves the counter is path-agnostic and catches hallucinated variants.
test_execute_script_failures_isolated_per_invocation — failures from invocation A do not escalate the first attempt in invocation B even when sharing the same session state dict.
test_execute_script_counter_uses_temp_prefix — every key written by the guard starts with temp: so it is trimmed from the persisted event delta.
test_execute_script_script_not_found — tightened: first miss returns the unchanged SCRIPT_NOT_FOUND code and message.

Verification:

All 95 tests in test_skill_toolset.py pass (90 pre-existing + 5 added/tightened), 0 regressions.
pyink --check clean on both modified files.
isort --check-only clean on both modified files.
mypy src/google/adk/tools/skill_toolset.py reports the same 19 pre-existing errors as main; zero new errors introduced.

Backwards compatibility

The first-failure behavior is unchanged: same SCRIPT_NOT_FOUND error code, same error string. Existing callers see no difference.
The new SCRIPT_NOT_FOUND_FATAL code is purely additive.
The new state key uses the temp: prefix and is guaranteed not to leak into persisted session storage.
The new prompt rule is additive; the prior four rules are unchanged.

Relationship to #5651

This PR is logically independent and merges cleanly against current main. If #5651 lands first, the prompts will need a trivial merge (both PRs append a new rule to the same instruction string); the code paths are in different methods and do not conflict.

…T_FOUND Mirrors the LoadSkillResourceTool fix (google#5651): when the LLM hallucinates a script path that does not exist in the skill's scripts/ directory, RunSkillScriptTool returns SCRIPT_NOT_FOUND as a soft error and the model retries with a different plausible path each turn. Nothing escalates between attempts, so the loop terminates only when RunConfig.max_llm_calls (default 500) is exhausted. Adds an invocation-scoped failure counter under temp:_adk_skill_script_not_found_count_<invocation_id>. The first miss within an invocation still returns SCRIPT_NOT_FOUND (unchanged); any subsequent miss returns the new SCRIPT_NOT_FOUND_FATAL with an explicit "do not retry, report and stop" message and the failure count. The counter is path-agnostic, so it fires even when the model hallucinates a different script path on each retry. The temp: prefix keeps the key out of durable session storage; the invocation_id suffix isolates in-memory backends where temp: keys are not auto-cleared between invocations. The default system prompt also gains a no-retry rule for run_skill_script errors so the model has a semantic reason to stop, complementing the code-level guard. Tests cover: first-miss soft error preserved, 2nd-miss-same-path escalates, 2nd-miss-different-path escalates (path-agnostic), per-invocation isolation, counter key uses temp: prefix.

Raman369AI mentioned this pull request May 13, 2026

[Bug] RunSkillScriptTool retries SCRIPT_NOT_FOUND indefinitely; default max_llm_calls=500 is the only backstop #5684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#5683

fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#5683
Raman369AI wants to merge 1 commit into
google:mainfrom
Raman369AI:fix/skill-script-not-found-infinite-loop

Raman369AI commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Raman369AI commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

What changed

1. Code: invocation-scoped failure counter in RunSkillScriptTool.run_async

2. Prompt: no-retry rule for run_skill_script

Why both layers

Considered and rejected

Test plan

Backwards compatibility

Relationship to #5651

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Raman369AI commented May 13, 2026 •

edited

Loading

1. Code: invocation-scoped failure counter in `RunSkillScriptTool.run_async`

2. Prompt: no-retry rule for `run_skill_script`