Skip to content

fix: handle malformed tool names with XML tag fragments#2819

Draft
juanmichelini wants to merge 2 commits intomainfrom
openhands/fix-malformed-tool-names
Draft

fix: handle malformed tool names with XML tag fragments#2819
juanmichelini wants to merge 2 commits intomainfrom
openhands/fix-malformed-tool-names

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

@juanmichelini juanmichelini commented Apr 13, 2026

Summary

When LLMs like qwen3-coder-next emit malformed tool names such as str_replace </parameter or str_replace</function>, extract the first valid identifier and map it to the correct tool.

This addresses issue #2818 which reported a 77.1% error rate for qwen3-coder-next, with the most common error (35.8%) being Tool 'str_replace </parameter' not found.

Changes

  • Add _try_fix_malformed_tool_name() function in openhands-sdk/openhands/sdk/agent/utils.py
  • Update normalize_tool_call() to use the new function before applying aliases
  • Add 6 regression tests for malformed tool name handling

Testing

  • All 20 tests in test_tool_call_compatibility.py pass
  • Pre-commit hooks pass

Fixes #2818


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:ff8d7a6-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-ff8d7a6-python \
  ghcr.io/openhands/agent-server:ff8d7a6-python

All tags pushed for this build

ghcr.io/openhands/agent-server:ff8d7a6-golang-amd64
ghcr.io/openhands/agent-server:ff8d7a6-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:ff8d7a6-golang-arm64
ghcr.io/openhands/agent-server:ff8d7a6-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:ff8d7a6-java-amd64
ghcr.io/openhands/agent-server:ff8d7a6-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:ff8d7a6-java-arm64
ghcr.io/openhands/agent-server:ff8d7a6-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:ff8d7a6-python-amd64
ghcr.io/openhands/agent-server:ff8d7a6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:ff8d7a6-python-arm64
ghcr.io/openhands/agent-server:ff8d7a6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:ff8d7a6-golang
ghcr.io/openhands/agent-server:ff8d7a6-java
ghcr.io/openhands/agent-server:ff8d7a6-python

About Multi-Architecture Support

  • Each variant tag (e.g., ff8d7a6-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., ff8d7a6-python-amd64) are also available if needed

When LLMs like qwen3-coder-next emit malformed tool names such as
'str_replace </parameter' or 'str_replace</function>', extract the
first valid identifier and map it to the correct tool.

This reduces errors for models that emit XML/HTML tag fragments
appended to tool names.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   utils.py1651292%105, 125–126, 199, 205, 257, 259, 261, 267, 277, 293, 347
TOTAL22592648371% 

@juanmichelini
Copy link
Copy Markdown
Collaborator Author

Still some errors but looking better
https://openhands-eval-monitor.vercel.app/?run=swebench%2Flitellm_proxy-openrouter-qwen-qwen3-coder-next%2F24368359316%2F&text=qwen+coder

Will try on SWTbench which is the crucial one

@juanmichelini
Copy link
Copy Markdown
Collaborator Author

Errors are down, but score in SWT is still low. Might be interested to merge it anyway

See run https://openhands-eval-monitor.vercel.app/?run=swebench%2Flitellm_proxy-openrouter-qwen-qwen3-coder-next%2F24368359316%2F&text=qwen+coder

@juanmichelini
Copy link
Copy Markdown
Collaborator Author

@OpenHands /codereview-raosted

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 14, 2026

I'm on it! juanmichelini can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 14, 2026

Summary

I performed a code review of PR #2819 which fixes handling of malformed tool names like "str_replace </parameter" or "str_replace</function>" that LLMs like qwen3-coder-next sometimes emit.

Review findings:

All checks passed - No issues found

  • Correctness: The implementation correctly extracts valid identifiers and maps them to tools via aliases
  • Testing: All 20 tests pass, including 6 new regression tests for malformed name handling
  • Security: Pure string manipulation, no injection risks
  • Complexity: Minimal, focused implementation

Changes reviewed:

  1. New _try_fix_malformed_tool_name() function using regex to extract the first valid identifier
  2. Updated normalize_tool_call() to apply the fix before aliases/terminal fallback
  3. 6 new tests covering malformed name patterns (XML tags, function tags, aliases, edge cases)

Recommendation: APPROVE - The PR is low-risk and ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qwen3-coder-next conversation error rate 77.1%

2 participants