fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#5683
Open
Raman369AI wants to merge 1 commit into
Open
fix: terminate infinite retry loop in RunSkillScriptTool on SCRIPT_NOT_FOUND#5683Raman369AI wants to merge 1 commit into
Raman369AI wants to merge 1 commit into
Conversation
…T_FOUND Mirrors the LoadSkillResourceTool fix (google#5651): when the LLM hallucinates a script path that does not exist in the skill's scripts/ directory, RunSkillScriptTool returns SCRIPT_NOT_FOUND as a soft error and the model retries with a different plausible path each turn. Nothing escalates between attempts, so the loop terminates only when RunConfig.max_llm_calls (default 500) is exhausted. Adds an invocation-scoped failure counter under temp:_adk_skill_script_not_found_count_<invocation_id>. The first miss within an invocation still returns SCRIPT_NOT_FOUND (unchanged); any subsequent miss returns the new SCRIPT_NOT_FOUND_FATAL with an explicit "do not retry, report and stop" message and the failure count. The counter is path-agnostic, so it fires even when the model hallucinates a different script path on each retry. The temp: prefix keeps the key out of durable session storage; the invocation_id suffix isolates in-memory backends where temp: keys are not auto-cleared between invocations. The default system prompt also gains a no-retry rule for run_skill_script errors so the model has a semantic reason to stop, complementing the code-level guard. Tests cover: first-miss soft error preserved, 2nd-miss-same-path escalates, 2nd-miss-different-path escalates (path-agnostic), per-invocation isolation, counter key uses temp: prefix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #5684
Companion to #5651 — closes the same defect class for
RunSkillScriptTool.Summary
RunSkillScriptTool.run_asyncreturnsSCRIPT_NOT_FOUNDas a structured soft-error string when a script path passed by the LLM does not exist in the skill'sscripts/directory. Because the response is a normal tool result (not an exception or terminal signal), the LLM treats it as transient and retries — and in practice it hallucinates a different plausible script path on each retry (e.g.scripts/setup.py,scripts/build.py,scripts/run.sh). Nothing inRunSkillScriptTooltracks failures across paths, so the loop continues untilRunConfig.max_llm_calls(default 500) is exhausted.This is the same defect mode as
LoadSkillResourceTool(#5652 / #5651), in the same file, with the same fix shape. Filing as a separate PR for a clean history.Why this matters
The four conditions that made the resource-loop reachable apply identically to scripts:
load_skillresponse intentionally omits a list of available scripts (progressive-disclosure spec). The LLM infers script filenames from prose inSKILL.md, and inferred filenames likescripts/setup.pyorscripts/build.share routinely wrong.SCRIPT_NOT_FOUNDlooks transient to the model.max_llm_calls=500is the only backstop — a global reasoning cap, not a defense against a known failure mode on one tool.What changed
Two layers in
src/google/adk/tools/skill_toolset.py, plus tests:1. Code: invocation-scoped failure counter in
RunSkillScriptTool.run_asyncA total
SCRIPT_NOT_FOUNDcount (across all paths) is tracked ontool_context.stateunder:temp:prefix uses ADK's existing convention so the value is trimmed from the persisted event delta and never reaches durable session storage.<invocation_id>suffix ensures correctness on in-memory session backends, wheretemp:keys are not auto-cleared between invocations.Behavior:
SCRIPT_NOT_FOUNDwithin an invocation (any path) → returnsSCRIPT_NOT_FOUND(unchanged).SCRIPT_NOT_FOUND_FATALerror code with an explicit "do not retry — report the error and stop" message, including the failure number.The guard is invocation-scoped, path-agnostic (catches hallucinated variants), and adds no overhead on the success path. The counter key is independent of the resource counter introduced in #5651 — a single miss on each tool does not immediately escalate.
2. Prompt: no-retry rule for
run_skill_scriptA new rule 5 in
_DEFAULT_SKILL_SYSTEM_INSTRUCTION:Why both layers
Defense-in-depth, identical reasoning to #5651: code-only termination produces confusing downstream behavior when the LLM has no semantic reason to stop; prompt-only termination relies on the LLM following the rule, which imperfect upstream prompts can override.
Considered and rejected
max_llm_callsafter_tool_callbackworkaroundSkillToolsetuseravailable_scriptsmanifest to L2load_skilllist_skill_scriptstoolload_skill_resourceTest plan
New tests in
tests/unittests/tools/test_skill_toolset.py:test_execute_script_repeated_failure_escalates_to_fatal— second call on the same missing path within an invocation returnsSCRIPT_NOT_FOUND_FATAL.test_execute_script_different_path_also_escalates_to_fatal— second call on a different missing path also returnsSCRIPT_NOT_FOUND_FATAL; proves the counter is path-agnostic and catches hallucinated variants.test_execute_script_failures_isolated_per_invocation— failures from invocation A do not escalate the first attempt in invocation B even when sharing the same session state dict.test_execute_script_counter_uses_temp_prefix— every key written by the guard starts withtemp:so it is trimmed from the persisted event delta.test_execute_script_script_not_found— tightened: first miss returns the unchangedSCRIPT_NOT_FOUNDcode and message.Verification:
test_skill_toolset.pypass (90 pre-existing + 5 added/tightened), 0 regressions.pyink --checkclean on both modified files.isort --check-onlyclean on both modified files.mypy src/google/adk/tools/skill_toolset.pyreports the same 19 pre-existing errors asmain; zero new errors introduced.Backwards compatibility
SCRIPT_NOT_FOUNDerror code, same error string. Existing callers see no difference.SCRIPT_NOT_FOUND_FATALcode is purely additive.temp:prefix and is guaranteed not to leak into persisted session storage.Relationship to #5651
This PR is logically independent and merges cleanly against current
main. If #5651 lands first, the prompts will need a trivial merge (both PRs append a new rule to the same instruction string); the code paths are in different methods and do not conflict.