refactor(sdk): extract LLM response classification and dispatch from Agent.step()#2743
refactor(sdk): extract LLM response classification and dispatch from Agent.step()#2743VascoSch92 wants to merge 4 commits intomainfrom
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Textbook refactoring. Reduces 85 lines of branching to 10 lines of clean dispatch. Pure classifier is independently testable. Tests are real (not mocks). No technical issues found.
Note: Per repo policy, agent loop changes require human maintainer eval verification before merge (previously flagged and acknowledged).
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Textbook refactoring. Reduces 85 lines of branching to 10 lines of clean dispatch. Pure classifier is independently testable. Tests are real (not mocks). No technical issues found.
Note: Per repo policy, agent loop changes require human maintainer eval verification before merge (previously flagged and acknowledged).
|
[Automatic Post]: It has been a while since there was any activity on this PR. @VascoSch92, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
@OpenHands /codereview-roasted on this pr, post your feedback as a review with gh api and event. |
|
I'm on it! enyst can track my progress at all-hands.dev |
enyst
left a comment
There was a problem hiding this comment.
🟢 Taste Rating: Good taste
This is the right kind of refactor. The data flow is simpler now: classify once, dispatch once, keep the behavior in small handlers instead of burying it in one long branch pile. No new special cases, no API churn, and the tests exercise the real Agent.step() path instead of only asserting that mocks were called.
I reviewed the diff against main, checked the new classifier/dispatch split, and ran:
uv run pytest tests/sdk/agent/test_response_dispatch.pyResult: 19 passed.
Because this change touches Agent.step() and response/tool-dispatch control flow, I'm leaving a COMMENT review rather than an approval; per repo policy, a human maintainer should make the final call after lightweight eval/CI confidence.
VERDICT: ✅ Worth merging pending human eval-risk sign-off
KEY INSIGHT: The refactor improves the structure by making response classification the single decision point, while keeping behavior local to dedicated handlers and preserving the existing control flow.
AI disclosure: This review was generated by an AI assistant (OpenHands) on behalf of the user.
|
Final summary of new work since the last summary:
Checklist:
Conciseness:
Status:
|
|
@enyst I think we are good to merge or? :-) |
Summary
Extracts the monolithic LLM response handling block from Agent.step() into a pure classifier function and a dispatch mixin, following the existing CriticMixin pattern.
What changed
What did NOT change
All existing behavior is preserved exactly. No new features, no bug fixes — this is a pure structural refactoring.
Motivation
Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:95e3a6b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
95e3a6b-python) is a multi-arch manifest supporting both amd64 and arm6495e3a6b-python-amd64) are also available if needed