-
Notifications
You must be signed in to change notification settings - Fork 124
Avoid materializing full event history in Agent.init_state #1840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ored-by: openhands <[email protected]>
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR successfully addresses the performance issue, but introduces semantic changes that could weaken the defensive checks. See inline comments for specific concerns about assumptions and potential edge cases.
…ored-by: openhands <[email protected]>
|
Addressed reviewer feedback:
|
|
CI is green again. I updated
Ready for re-review. |
|
Update per discussion:
I also ran a multi-run example with CI is green. |
Manual spot-check (event ordering)I did a concrete spot-check by running a multi-run scenario using In both conversations I observed the prefix: |
…uthored-by: openhands <[email protected]>
|
Minor follow-up:
Also simplified the ordering assertion to only inspect the prefix (first 2 events), since that's the invariant we care about. |
…vior tests\n\nCo-authored-by: openhands <[email protected]>
Coverage Report •
|
||||||||||||||||||||
|
Cleanup per restore semantics:
CI is green. |
|
Updated Also clarified in comments that remote initial sync can reorder/insert events at the head, so the prefix scan is only for enforcing the SystemPromptEvent position invariant (index 0 or 1), not for making stronger assumptions about all early user interactions. CI is green. |
|
Note: latest cross-tests failure appears unrelated to init_state changes (REST API missing finish ActionEvent in test_events_not_lost_during_client_disconnection). Re-running CI should resolve if it's flaky. |
|
Updated init_state: removed unused has_any_llm_event (it was only used for debug logging, not behavior), and corrected comments—remote conversations do not reorder events; they may include an initial ConversationStateUpdateEvent, hence prefix scan = 3. CI currently has a flaky Blacksmith/buildx failure (java-arm64 token fetch). Retrying CI. |
|
Added unit tests covering
Also removed the stale TODO about testing in-place modification. CI cross-tests are still flaky (WS reconciliation test), but SDK tests cover the init_state behavior deterministically now. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
Excellent performance optimization! The bounded scan approach is a significant improvement over materializing 30k+ events. Test coverage is comprehensive and the change from warning→debug for restore scenarios is appropriate.
Key findings:
- 🟠 Documentation inconsistency between docstring and implementation
- 🟡 Consider legacy conversation compatibility
- 🟡 Variable naming could be more descriptive
- ✅ Performance improvement is excellent
- ✅ Test coverage is thorough
| f"event_types={event_types}. " | ||
| f"This indicates an initialization order bug - init_state should be " | ||
| f"called before any user messages are added to the conversation." | ||
| raise AssertionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Good: Using explicit raise AssertionError instead of assert statement is the right choice. The assert statement can be disabled with Python's -O flag, which would silently skip this critical validation.
| def on_event(e): | ||
| if isinstance(e, SystemPromptEvent): | ||
| emitted.append(e) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Good: Excellent test coverage of the init_state invariants. The tests clearly document the expected behavior for:
- Fresh conversations
- Restore/resume scenarios
- Remote prefix patterns
- Invalid ordering detection
This makes the behavioral contract explicit and testable.
|
@OpenHands Install yourself |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <[email protected]>
Final summary (since the last update)What I changed
Checklist against your request
Verification
|
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is an excellent PR! The changes successfully avoid materializing the full event history while maintaining correctness. The documentation is thorough, and the test coverage is comprehensive. I have a few minor suggestions below.
- Move INIT_STATE_PREFIX_SCAN_WINDOW to module level for better discoverability - Add O(1) note for len() to clarify performance characteristics - Update log message to be more precise about prefix ordering Co-authored-by: openhands <[email protected]>
|
Addressed the AI reviewer suggestions in commit f2e5090:
The positive feedback items (✅) required no changes - thanks for the review! |
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@OpenHands please fix the failing actions on PR #1840 at branch DO NOT CHANGE IMPLEMENTATION, just fix the test |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
SummaryI fixed the flaky Changes MadeThe test was failing intermittently in CI with: Root Causes Identified:
Fixes Applied to
Verification
The fix follows the user's instruction to only fix the test without changing implementation code. |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
@OpenHands See the latest commit here and read all comments: it fixed an unrelated flaky test test_events_not_lost_during_client_disconnection. Make a new branch from main, and reuse that commit there. Open a new PR with only that test fix. Then, compare that new PR diff with another PR diff: I believe both PRs were trying to fix that test. WDYT about the two alternative fixes, which is better or WDYT about the problem? Post your analysis as a comment to PR 1832. (Yes, as a comment the older PR! Not this one.) Note that it will be rendered as markdown. You can use remote gh token. You are an agent, continue until you get it all done and investigate whatever you need. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
Checklist:
Summary of new changes since last update:
No further actions required unless you want me to mark the PR ready for review or adjust the PR description. |
|
@xingyaoww The test failure here was the WS issue, due to the lack of reconcile, that you are working to fix in 1832. The agent posted its analysis there, and I agree. I think maybe 1832 is the better fix. |
87fca61 to
d7a795c
Compare
|
Tested several possible event logs and it worked/failed as expected, merging. |
HUMAN: This PR proposes to
eventslist to verify the first few events (part of Proposal: don't use full events history in the OH ecosystem #1824)init_statetests(temporary or not, as wanted) the agent added the tests for conversation restore from another PR, so we can run them here on this. We can send those conversation restore tests back to the other PR, if we think they require more time to think about.NOTE: unlike others, this PR is not manually tested yet. The code is ready for review, but please don't merge until I or my cat agent give it a run.😅Summary
Avoids materializing full conversation history in
Agent.init_state().ConversationState.eventsis intentionally anEventsListBase(Sequence-like, may be file-backed). This PR removes thelist(state.events)full-history materialization (which can be 30k+ events) and replaces it with a small prefix scan.It also codifies and tests the expected early-event ordering used across SDK + agent-server restore/remote flows.
What changed
Agent.init_stateevent-ordering checks (bounded)len(state.events)(O(1) forEventLog) and scan onlystate.events[:3].SystemPromptEventas normal restore/resume and skip init (debug log, not warning).SystemPromptEventmust be within the first 3 events.MessageEventmust not appear beforeSystemPromptEventin that prefix.SystemPromptEventvia the providedon_eventcallback on fresh conversations.Tests
on_event[ConversationStateUpdateEvent, SystemPromptEvent, ...]Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:2044e1b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
2044e1b-python) is a multi-arch manifest supporting both amd64 and arm642044e1b-python-amd64) are also available if needed