Avoid materializing full event history in Agent.init_state #1840

enyst · 2026-01-27T07:06:32Z

HUMAN: This PR proposes to

stop using events list to verify the first few events (part of Proposal: don't use full events history in the OH ecosystem #1824)
re-codify the invariants here (system prompt, user message) a bit, and document in comments
add tests for an older comment on init_state tests
(temporary or not, as wanted) the agent added the tests for conversation restore from another PR, so we can run them here on this. We can send those conversation restore tests back to the other PR, if we think they require more time to think about.

~~NOTE: unlike others, this PR is not manually tested yet. The code is ready for review, but please don't merge until I or my cat agent give it a run.~~ 😅

Summary

Avoids materializing full conversation history in Agent.init_state().

ConversationState.events is intentionally an EventsListBase (Sequence-like, may be file-backed). This PR removes the list(state.events) full-history materialization (which can be 30k+ events) and replaces it with a small prefix scan.

It also codifies and tests the expected early-event ordering used across SDK + agent-server restore/remote flows.

What changed

`Agent.init_state` event-ordering checks (bounded)

Use len(state.events) (O(1) for EventLog) and scan only state.events[:3].
Treat existing SystemPromptEvent as normal restore/resume and skip init (debug log, not warning).
Enforce the invariant we actually rely on:
- If present, SystemPromptEvent must be within the first 3 events.
- A user MessageEvent must not appear before SystemPromptEvent in that prefix.
Still emits a SystemPromptEvent via the provided on_event callback on fresh conversations.

Tests

Added SDK unit tests for the above invariants:
- system prompt emission via on_event
- skip behavior on restore/resume
- remote prefix [ConversationStateUpdateEvent, SystemPromptEvent, ...]
- assertion if user message precedes system prompt in the prefix
Added restore-behavior spec tests (ported from PR Add integration-style tests for conversation restore behavior #1799) documenting LocalConversation restore semantics.

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:2044e1b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-2044e1b-python \
  ghcr.io/openhands/agent-server:2044e1b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:2044e1b-golang-amd64
ghcr.io/openhands/agent-server:2044e1b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:2044e1b-golang-arm64
ghcr.io/openhands/agent-server:2044e1b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:2044e1b-java-amd64
ghcr.io/openhands/agent-server:2044e1b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:2044e1b-java-arm64
ghcr.io/openhands/agent-server:2044e1b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:2044e1b-python-amd64
ghcr.io/openhands/agent-server:2044e1b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:2044e1b-python-arm64
ghcr.io/openhands/agent-server:2044e1b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:2044e1b-golang
ghcr.io/openhands/agent-server:2044e1b-java
ghcr.io/openhands/agent-server:2044e1b-python

About Multi-Architecture Support

Each variant tag (e.g., 2044e1b-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 2044e1b-python-amd64) are also available if needed

…ored-by: openhands <[email protected]>

all-hands-bot

This PR successfully addresses the performance issue, but introduces semantic changes that could weaken the defensive checks. See inline comments for specific concerns about assumptions and potential edge cases.

openhands-sdk/openhands/sdk/agent/agent.py

…ored-by: openhands <[email protected]>

enyst · 2026-01-27T07:26:21Z

Addressed reviewer feedback:

Confirmed EventLog.__len__ is O(1) (returns cached _length), so using len(state.events) does not materialize history.
Avoided weakening semantics by doing a fast head scan and then a bounded tail scan to detect existing SystemPromptEvent / user messages even on resume, without loading full history.
Extracted the previous magic numbers into named constants (INIT_STATE_HEAD_CHECK, INIT_STATE_TAIL_CHECK).
Reduced debug memory by logging event type names directly instead of materializing full events.

enyst · 2026-01-27T07:43:13Z

CI is green again.

I updated Agent.init_state to avoid list(state.events) while preserving the original intent of the defensive checks:

len(state.events) is O(1) for file-backed EventLog (cached _length).
Added named constants and a tail-bounded fallback scan (head=5, tail=200) to avoid false negatives on resumed conversations.
Kept error logging bounded (first 50 event type names).

Ready for re-review.

enyst · 2026-01-27T08:10:38Z

Update per discussion:

Removed tail-scanning entirely.
Documented the actual invariant we want: when init_state runs, SystemPromptEvent (if present) must be at index 0 or 1 (agent-server may emit an initial ConversationStateUpdateEvent first).
Implemented prefix-only checks over the first 2 events (no list materialization).

I also ran a multi-run example with openai/gpt-5-nano and observed both planning and execution conversations start with:
[SystemPromptEvent, MessageEvent, ...].

CI is green.

smolpaws · 2026-01-27T09:36:01Z

Manual spot-check (event ordering)

I did a concrete spot-check by running a multi-run scenario using openai/gpt-5-nano (planning conversation then execution conversation), and printed event prefixes.

In both conversations I observed the prefix:

[SystemPromptEvent, MessageEvent, ActionEvent, ObservationEvent, ...]

…uthored-by: openhands <[email protected]>

smolpaws · 2026-01-27T10:07:56Z

Minor follow-up:

state.events is intentionally an EventsListBase (Sequence-like), not a Python list.
We should avoid list(state.events) because it would materialize the entire history (can be 30k+ events).

Also simplified the ordering assertion to only inspect the prefix (first 2 events), since that's the invariant we care about.

…vior tests\n\nCo-authored-by: openhands <[email protected]>

github-actions · 2026-01-27T10:16:29Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/agent
agent.py	246	49	80%	94, 98, 203–204, 208–209, 211, 217–218, 220, 222–223, 228, 231, 235–238, 274–276, 304–305, 312–313, 345, 398–399, 401, 441, 580–581, 586, 598–599, 604–605, 624–625, 627, 655–656, 662–663, 667, 675–676, 716, 723
TOTAL	16430	4783	70%

smolpaws · 2026-01-27T10:19:37Z

Cleanup per restore semantics:

Treating SystemPromptEvent already present as normal restore/resume (downgraded from warning to debug, removed “unexpected during normal flow” wording).
Pulled in the restore-behavior spec tests from PR Add integration-style tests for conversation restore behavior #1799 (added tests/cross/test_conversation_restore_behavior.py) and verified they pass.

CI is green.

smolpaws · 2026-01-27T11:02:17Z

Updated INIT_STATE_PREFIX_EVENTS to 3 to cover the typical remote prefix:

[ConversationStateUpdateEvent, SystemPromptEvent, MessageEvent, ...]

Also clarified in comments that remote initial sync can reorder/insert events at the head, so the prefix scan is only for enforcing the SystemPromptEvent position invariant (index 0 or 1), not for making stronger assumptions about all early user interactions.

CI is green.

openhands-sdk/openhands/sdk/agent/agent.py

…ment

smolpaws · 2026-01-27T11:20:25Z

Note: latest cross-tests failure appears unrelated to init_state changes (REST API missing finish ActionEvent in test_events_not_lost_during_client_disconnection). Re-running CI should resolve if it's flaky.

smolpaws · 2026-01-27T11:25:37Z

Updated init_state: removed unused has_any_llm_event (it was only used for debug logging, not behavior), and corrected comments—remote conversations do not reorder events; they may include an initial ConversationStateUpdateEvent, hence prefix scan = 3.

CI currently has a flaky Blacksmith/buildx failure (java-arm64 token fetch). Retrying CI.

…nhands <[email protected]>

smolpaws · 2026-01-27T12:15:16Z

Added unit tests covering Agent.init_state invariants (SDK tests):

emits SystemPromptEvent via on_event on fresh state
skips emission when SystemPromptEvent already present
skips emission when remote prefix has [ConversationStateUpdateEvent, SystemPromptEvent, ...]
raises if a user MessageEvent appears in the prefix before the system prompt

Also removed the stale TODO about testing in-place modification.

CI cross-tests are still flaky (WS reconciliation test), but SDK tests cover the init_state behavior deterministically now.

all-hands-bot

Summary

Excellent performance optimization! The bounded scan approach is a significant improvement over materializing 30k+ events. Test coverage is comprehensive and the change from warning→debug for restore scenarios is appropriate.

Key findings:

🟠 Documentation inconsistency between docstring and implementation
🟡 Consider legacy conversation compatibility
🟡 Variable naming could be more descriptive
✅ Performance improvement is excellent
✅ Test coverage is thorough

openhands-sdk/openhands/sdk/agent/agent.py

all-hands-bot · 2026-01-27T12:33:54Z

openhands-sdk/openhands/sdk/agent/agent.py

-                f"event_types={event_types}. "
-                f"This indicates an initialization order bug - init_state should be "
-                f"called before any user messages are added to the conversation."
+            raise AssertionError(


✅ Good: Using explicit raise AssertionError instead of assert statement is the right choice. The assert statement can be disabled with Python's -O flag, which would silently skip this critical validation.

all-hands-bot · 2026-01-27T12:33:54Z

tests/sdk/agent/test_agent_init_state_invariants.py

+    def on_event(e):
+        if isinstance(e, SystemPromptEvent):
+            emitted.append(e)
+


✅ Good: Excellent test coverage of the init_state invariants. The tests clearly document the expected behavior for:

Fresh conversations

Restore/resume scenarios

Remote prefix patterns

Invalid ordering detection

This makes the behavioral contract explicit and testable.

enyst · 2026-01-27T12:59:25Z

@OpenHands Install yourself gh if you don't have it. Read all inline review threads, including my responses. Address those that I didn't mark as solved and are addressable.

openhands-ai · 2026-01-27T12:59:35Z

I'm on it! enyst can track my progress at all-hands.dev

Co-authored-by: openhands <[email protected]>

openhands-ai · 2026-01-27T13:02:25Z

Final summary (since the last update)

What I changed

Fixed a docstring inconsistency in Agent.init_state: it now correctly describes that we scan 3 events (indices 0–2), while still enforcing that SystemPromptEvent must be at index 0 or 1 (index 2 is included to detect a user message preceding the system prompt).
Renamed the constant from INIT_STATE_PREFIX_EVENTS to INIT_STATE_PREFIX_SCAN_WINDOW to make it clearer that this value is the scan window size.

Checklist against your request

Installed gh (GitHub CLI) since it wasn’t present.
Read all inline review threads (including responses).
Addressed the unsolved and addressable review feedback (docstring mismatch + more descriptive constant name).
Kept changes minimal and focused (single file, doc + constant rename only).
Committed and pushed updates to the existing PR branch (no new PR opened, no PR comments posted).

Verification

I double-checked git status and the diff against origin/openhands/bounded-init-state-scan: no uncommitted changes and no local-only diffs remain (everything is already pushed).

View full conversation

all-hands-bot

Overall, this is an excellent PR! The changes successfully avoid materializing the full event history while maintaining correctness. The documentation is thorough, and the test coverage is comprehensive. I have a few minor suggestions below.

openhands-sdk/openhands/sdk/agent/agent.py

- Move INIT_STATE_PREFIX_SCAN_WINDOW to module level for better discoverability - Add O(1) note for len() to clarify performance characteristics - Update log message to be more precise about prefix ordering Co-authored-by: openhands <[email protected]>

enyst · 2026-01-28T05:58:33Z

Addressed the AI reviewer suggestions in commit f2e5090:

Log message precision (🟡): Updated log message to "User message found in prefix before SystemPromptEvent!" - more accurately describes what we're checking.
Module-level constant (🟡): Moved INIT_STATE_PREFIX_SCAN_WINDOW to module level with a descriptive comment for better discoverability.
O(1) len() note (🟢): Added a note clarifying that len() is O(1) for EventLog.

The positive feedback items (✅) required no changes - thanks for the review!

xingyaoww

LGTM

xingyaoww · 2026-01-30T08:14:25Z

@OpenHands please fix the failing actions on PR #1840 at branch openhands/bounded-init-state-scan

DO NOT CHANGE IMPLEMENTATION, just fix the test

openhands-ai · 2026-01-30T08:14:33Z

I'm on it! xingyaoww can track my progress at all-hands.dev

openhands-ai · 2026-01-30T08:37:58Z

Summary

I fixed the flaky test_events_not_lost_during_client_disconnection test in PR #1840 by making the test more robust against race conditions.

Changes Made

The test was failing intermittently in CI with:

AssertionError: REST API should have ActionEvent with finish tool. REST events: ['SystemPromptEvent(N/A)', 'MessageEvent(N/A)', 'ConversationStateUpdateEvent(N/A)']

Root Causes Identified:

Server shutdown wasn't completing before the next test started (port still in use)
Events weren't always persisted when queried immediately after conv.run() returned
Directory cleanup was racing with server shutdown

Fixes Applied to tests/cross/test_remote_conversation_live_server.py:

Increased server shutdown timeout from 2s to 5s to allow more time for graceful shutdown
Added port release check - Wait up to 2 seconds for the port to be released before cleanup, preventing the next test from failing to start its server
Added ignore_errors=True to shutil.rmtree - Handles race conditions where the server may still be writing files during cleanup
Added retry loop for REST API queries - Retries up to 10 times (with 0.2s delay) when fetching events from the REST API to handle eventual consistency

Verification

Ran the test 20 times consecutively with 100% pass rate (previously failing ~5% of the time)
All 66 cross-tests pass
Changes have been pushed to the openhands/bounded-init-state-scan branch

The fix follows the user's instruction to only fix the test without changing implementation code.

View full conversation

openhands-ai · 2026-01-30T08:49:42Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1840 at branch `openhands/bounded-init-state-scan`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

enyst · 2026-01-30T09:11:33Z

@OpenHands See the latest commit here and read all comments: it fixed an unrelated flaky test test_events_not_lost_during_client_disconnection.

Make a new branch from main, and reuse that commit there. Open a new PR with only that test fix.

Then, compare that new PR diff with another PR diff:

fix: restore post-run reconcile call to prevent event loss #1832

I believe both PRs were trying to fix that test. WDYT about the two alternative fixes, which is better or WDYT about the problem? Post your analysis as a comment to PR 1832. (Yes, as a comment the older PR! Not this one.) Note that it will be rendered as markdown.

You can use remote gh token.

You are an agent, continue until you get it all done and investigate whatever you need.

openhands-ai · 2026-01-30T09:11:43Z

I'm on it! enyst can track my progress at all-hands.dev

openhands-ai · 2026-01-30T09:14:41Z

Checklist:

Created new branch from main and cherry-picked the flaky test fix commit (only test changes).
Opened a new PR with only that test fix: Fix flaky test_events_not_lost_during_client_disconnection #1858
Compared PR Fix flaky test_events_not_lost_during_client_disconnection #1858 diff with PR fix: restore post-run reconcile call to prevent event loss #1832 and formed analysis.
Posted analysis as a markdown comment on PR fix: restore post-run reconcile call to prevent event loss #1832: fix: restore post-run reconcile call to prevent event loss #1832 (comment)
Changes are concise (single test file change in the new branch only).

Summary of new changes since last update:

Verified the branch diff is limited to the single test file change (47 insertions/19 deletions) and confirmed no extraneous modifications.

No further actions required unless you want me to mark the PR ready for review or adjust the PR description.

View full conversation

enyst · 2026-01-30T09:18:46Z

@xingyaoww The test failure here was the WS issue, due to the lack of reconcile, that you are working to fix in 1832. The agent posted its analysis there, and I agree. I think maybe 1832 is the better fix.

enyst · 2026-01-30T13:01:34Z

Tested several possible event logs and it worked/failed as expected, merging.

Avoid materializing full event history in Agent.init_state\n\nCo-auth…

49083a7

…ored-by: openhands <[email protected]>

enyst marked this pull request as ready for review January 27, 2026 07:13

all-hands-bot reviewed Jan 27, 2026

View reviewed changes

Make init_state checks bounded without weakening semantics\n\nCo-auth…

a4be2e2

…ored-by: openhands <[email protected]>

ci: retrigger

e09513d

Document init_state event ordering invariants

34e1194

enyst added 2 commits January 27, 2026 09:45

Clarify EventsListBase usage and tighten init_state assertion\n\nCo-a…

51d717e

…uthored-by: openhands <[email protected]>

ci: retrigger

5bd29f9

Treat existing SystemPromptEvent as normal restore + add restore beha…

2ba9f02

…vior tests\n\nCo-authored-by: openhands <[email protected]>

enyst added 2 commits January 27, 2026 10:48

Adjust init_state prefix scan window to 3 for remote prefix

a452d0a

Clarify init_state prefix scan invariants for remote event ordering

0ab9def

enyst marked this pull request as draft January 27, 2026 11:10

enyst commented Jan 27, 2026

View reviewed changes

openhands-sdk/openhands/sdk/agent/agent.py Outdated Show resolved Hide resolved

enyst added 2 commits January 27, 2026 12:11

Update openhands-sdk/openhands/sdk/agent/agent.py

bcc3d26

Drop unused has_any_llm_event debug flag; correct remote ordering com…

50b158c

…ment

ci: retrigger

f76aaaf

enyst added 2 commits January 27, 2026 11:25

ci: retrigger

4d9c6b5

Add unit tests for Agent.init_state invariants\n\nCo-authored-by: ope…

ece936e

…nhands <[email protected]>

enyst marked this pull request as ready for review January 27, 2026 12:31

all-hands-bot reviewed Jan 27, 2026

View reviewed changes

Clarify init_state scan window

2e0e606

Co-authored-by: openhands <[email protected]>

enyst mentioned this pull request Jan 27, 2026

SDK: remaining full-history materializations of state.events (Agent init_state/step/critic) #1839

Open

Merge branch 'main' into openhands/bounded-init-state-scan

6535d80

enyst marked this pull request as draft January 28, 2026 05:33

Merge branch 'main' into openhands/bounded-init-state-scan

9fe7da1

enyst marked this pull request as ready for review January 28, 2026 05:33

all-hands-bot reviewed Jan 28, 2026

View reviewed changes

openhands-sdk/openhands/sdk/agent/agent.py Show resolved Hide resolved

openhands-sdk/openhands/sdk/agent/agent.py Outdated Show resolved Hide resolved

openhands-sdk/openhands/sdk/agent/agent.py Show resolved Hide resolved

xingyaoww approved these changes Jan 30, 2026

View reviewed changes

enyst mentioned this pull request Jan 30, 2026

fix: restore post-run reconcile call to prevent event loss #1832

Draft

Merge branch 'main' into openhands/bounded-init-state-scan

d7a795c

enyst force-pushed the openhands/bounded-init-state-scan branch from 87fca61 to d7a795c Compare January 30, 2026 09:21

enyst merged commit 7b7faa2 into main Jan 30, 2026
23 checks passed

enyst deleted the openhands/bounded-init-state-scan branch January 30, 2026 13:01

Avoid materializing full event history in Agent.init_state #1840

Avoid materializing full event history in Agent.init_state #1840

Uh oh!

Conversation

enyst commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Agent.init_state event-ordering checks (bounded)

Tests

Checklist

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enyst commented Jan 27, 2026

Uh oh!

enyst commented Jan 27, 2026

Uh oh!

enyst commented Jan 27, 2026

Uh oh!

smolpaws commented Jan 27, 2026

Manual spot-check (event ordering)

Uh oh!

smolpaws commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smolpaws commented Jan 27, 2026

Uh oh!

smolpaws commented Jan 27, 2026

Uh oh!

Uh oh!

smolpaws commented Jan 27, 2026

Uh oh!

smolpaws commented Jan 27, 2026 • edited by enyst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smolpaws commented Jan 27, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

enyst commented Jan 27, 2026

Uh oh!

openhands-ai bot commented Jan 27, 2026

Uh oh!

openhands-ai bot commented Jan 27, 2026

Final summary (since the last update)

What I changed

Checklist against your request

Verification

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enyst commented Jan 28, 2026

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

enyst commented Jan 27, 2026 •

edited

Loading

`Agent.init_state` event-ordering checks (bounded)

github-actions bot commented Jan 27, 2026 •

edited

Loading

smolpaws commented Jan 27, 2026 •

edited by enyst

Loading