Skip to content

SDK: remaining full-history materializations of state.events (Agent init_state/step/critic) #1839

@smolpaws

Description

@smolpaws

Context

Following up on OpenHands/software-agent-sdk#1824 ("don't use full events history"). I audited the SDK codepaths for places that still materialize the full state.events history (which is file-backed / lazy in some configurations, but still expensive once list(...) is called).

This issue tracks remaining full-history materializations and proposes refactors to avoid them.

Why this matters

Even if events are file-backed, doing list(state.events) / reversed(list(state.events)):

  • reads/deserializes every event (often from disk)
  • allocates a full in-memory list
  • is O(N) per call, which becomes noticeable at 10k–30k+ events

Confirmed occurrences

1) Agent.init_state eagerly loads all events

File: openhands/sdk/agent/agent.py

# Agent.init_state
events = list(state.events)
...
logger.debug(... event_count={len(events)} ...)
...
has_system_prompt = any(isinstance(e, SystemPromptEvent) for e in events)
...

This is a defensive/logging check (references #1785), but it scales poorly.

Suggestion (from scratch):

  • Replace full materialization with a bounded scan of only what's needed.
  • For example:
    • Check the first ~1–5 events for a SystemPromptEvent.
    • Check the last ~N events for a user message / any LLM-convertible event.
    • Use itertools.islice(state.events, k) and reversed(state.events[-N:]) rather than list(state.events).

This keeps the check useful while preventing pathological costs.

2) Agent.step scans for the most recent user message by materializing all events

File: openhands/sdk/agent/agent.py

for event in reversed(list(state.events)):
    if isinstance(event, MessageEvent) and event.source == "user":
        ...
        break

This only needs the most recent user message and could be bounded.

Suggestion:

  • Replace with for event in reversed(state.events[-N:]): with an N like 200–1000 (configurable), or
  • Add a helper on ConversationState / events list abstraction: iter_recent_events(limit).

3) Critic evaluation builds full history list

File: openhands/sdk/agent/agent.py

events = list(conversation.state.events) + [event]
llm_convertible_events = [e for e in events if isinstance(e, LLMConvertibleEvent)]
critic_result = self.critic.evaluate(events=llm_convertible_events, ...)

Depending on critic behavior, this can become expensive as conversations grow.

Suggestion:

  • Define/introduce a critic API that can accept:
    • the current View (or last N LLM-convertible events)
    • and/or a summary/condensation
  • If critic truly needs long context, make it explicit and paginated (or use event-store queries) rather than list(...).

Proposed refactor direction

A few options that would pay dividends broadly:

  1. Introduce bounded iterators on the events abstraction

    • iter_tail(n) / tail(n) / iter_reverse(limit=None)
    • and use them everywhere we currently call list(state.events).
  2. Make "full history" an explicit opt-in

    • e.g. state.events.materialize_all() so expensive operations are obvious.
  3. Plumb View deeper

    • For subsystems like stuck detection / critic / security analyzers, accept a View (or Sequence that is already bounded).

Acceptance criteria

  • Replace the above full-history materializations with bounded scans.
  • Add a regression test ensuring these codepaths do not call list(state.events) (or at least that they do not iterate over the entire EventLog for large N; a synthetic EventLog with a side-effectful iterator can enforce this).

Notes

  • StuckDetector.is_stuck() was previously reported as materializing the full list, but in current code it already uses a slice:
    events = list(self.state.events[-MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION:])
    So this issue focuses on the remaining Agent paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions