Skip to content

Session snapshots cause extreme memory usage (w/ repro script) #17226

@oomathias

Description

@oomathias

Problem

Some sessions make OpenCode use multiple GB of memory.

If a session touches large generated text files across turns, memory can blow up while the session is being summarized, even if the visible conversation stays small.

I don't mind snapshots using disk space. The issue is loading full contents of changed generated files into active memory.

Real condition where this happens

This happens in ordinary repositories that generate large text artifacts while the agent is working, for example:

  • build systems
  • linker map files
  • generated logs / manifests / diagnostic outputs
  • other large machine-written text artifacts

If those files are rewritten across turns in the same session, a large enough artifact can turn one diff reconstruction into a multi-GB memory event.

Root cause

The root cause is eager full-body snapshot diff generation in Snapshot.diffFull().

What happens:

  1. Session parts store snapshot hashes (step-start / step-finish).
  2. When session summarization runs, OpenCode rebuilds diff summaries from those snapshot hashes.
  3. Snapshot.diffFull(from, to) compares the two snapshot trees.
  4. For each changed file, it loads the full old body and full new body into memory as before and after strings.
  5. If one changed file is huge, that single diff can materialize hundreds of MB or more on each side.
  6. That is enough to turn one diff reconstruction into a multi-GB memory event.

This eager diff body loading later also shows up as oversized data in stored session summaries under:

  • summary.diffs[*].before
  • summary.diffs[*].after

Steps to reproduce

Run the script below.

What the reproduction does:

  • creates a fresh temp git repo
  • continues the same session for a few turns
  • rewrites large generated files each turn
  • forces snapshot: true so global config does not disable the repro
  • then prints the generated session ID

Open that generated session:

opencode --session <session-id>
#!/usr/bin/env bash
set -euo pipefail

need() {
  command -v "$1" >/dev/null 2>&1 || {
    printf 'missing required command: %s\n' "$1" >&2
    exit 1
  }
}

need git
need opencode
need python3

ROOT="$(mktemp -d "${TMPDIR:-/tmp}/opencode-summary-repro.XXXXXX")"
ARTIFACTS="$ROOT-artifacts"
TITLE="summary-diff-repro-$(date +%s)"
PROMPT_FILE="$ARTIFACTS/prompt.txt"
OUTPUT_FILE="$ARTIFACTS/opencode-run.jsonl"
TURNS="${TURNS:-1}"
BIG1_MIB="${BIG1_MIB:-128}"
BIG2_MIB="${BIG2_MIB:-96}"
SMALL_KIB="${SMALL_KIB:-512}"

cleanup() {
  if [[ "${KEEP_TMP:-1}" != "1" ]]; then
    rm -rf "$ROOT" "$ARTIFACTS"
  fi
}
trap cleanup EXIT

printf 'workspace: %s\n' "$ROOT"
printf 'artifacts: %s\n' "$ARTIFACTS"

mkdir -p "$ARTIFACTS"

git -C "$ROOT" init -q
git -C "$ROOT" config user.name "bug-repro"
git -C "$ROOT" config user.email "bug-repro@example.com"

mkdir -p "$ROOT/src"
printf 'hello\n' > "$ROOT/src/main.txt"
git -C "$ROOT" add .
git -C "$ROOT" commit -qm "init"

write_prompt() {
  local turn="$1"
  cat > "$PROMPT_FILE" <<EOF
Use bash only.
Do not ask questions.
Do not use write, edit, or apply_patch.
Do not inspect the repo first.
Do not run ls, read, glob, grep, or verification commands.
Use exactly one bash tool call for the whole task.

In the current git repo, run shell commands that do all of the following:

1. Rewrite \
   \
   - build/CMakeFiles/CMakeConfigureLog.yaml to exactly ${BIG1_MIB} MiB of deterministic ASCII text for turn ${turn}
   - build/build.ninja to exactly ${BIG2_MIB} MiB of deterministic ASCII text for turn ${turn}
   - build/.ninja_log to exactly ${SMALL_KIB} KiB of deterministic ASCII text for turn ${turn}
2. Append one line mentioning turn ${turn} to src/main.txt.
3. Print a short confirmation.

Use deterministic generated text, for example with python3.
Create parent directories inside that single bash command.
Do not delete the generated files.
Stop when finished.
EOF
}

run_turn() {
  local turn="$1"
  local session_id="${2:-}"

  write_prompt "$turn"
  printf '\nturn %s/%s...\n' "$turn" "$TURNS"

  if [[ -n "$session_id" ]]; then
    if [[ -n "${OPENCODE_MODEL:-}" ]]; then
      opencode run \
        --dir "$ROOT" \
        --session "$session_id" \
        --format json \
        --model "$OPENCODE_MODEL" \
        "$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
    else
      opencode run \
        --dir "$ROOT" \
        --session "$session_id" \
        --format json \
        "$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
    fi
    return
  fi

  if [[ -n "${OPENCODE_MODEL:-}" ]]; then
    opencode run \
      --dir "$ROOT" \
      --title "$TITLE" \
      --format json \
      --model "$OPENCODE_MODEL" \
      "$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
  else
    opencode run \
      --dir "$ROOT" \
      --title "$TITLE" \
      --format json \
      "$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
  fi
}

lookup_session_id() {
  python3 - "$OUTPUT_FILE" <<'PY'
import json
import sys

for line in open(sys.argv[1], "r", encoding="utf-8"):
    line = line.strip()
    if not line:
        continue
    try:
        obj = json.loads(line)
    except Exception:
        continue
    session_id = obj.get("sessionID")
    if session_id:
        print(session_id)
        raise SystemExit(0)
PY
}

if [[ -z "${OPENCODE_PERMISSION:-}" ]]; then
  export OPENCODE_PERMISSION='{"bash":"allow","read":"allow","list":"allow","glob":"allow","grep":"allow","edit":"allow","task":"allow","todowrite":"allow","question":"deny"}'
fi

if [[ -z "${OPENCODE_CONFIG_CONTENT:-}" ]]; then
  export OPENCODE_CONFIG_CONTENT='{"$schema":"https://opencode.ai/config.json","snapshot":true}'
fi

: > "$OUTPUT_FILE"

printf 'running opencode...\n'
run_turn 1

DB_PATH="$(opencode db path)"
printf 'database: %s\n' "$DB_PATH"

SESSION_ID="$(lookup_session_id | tr -d '\n')"
if [[ -z "$SESSION_ID" ]]; then
  SESSION_ID="$(opencode db --format json "SELECT id FROM session WHERE title = '$TITLE' ORDER BY time_created DESC LIMIT 1;" | python3 -c 'import json,sys; rows=json.load(sys.stdin); print(rows[0]["id"] if rows else "")' | tr -d '\n')"
fi

if [[ -z "$SESSION_ID" ]]; then
  printf 'failed to determine session id from opencode output or database lookup\n' >&2
  exit 1
fi

if [[ "$TURNS" -gt 1 ]]; then
  turn=2
  while [[ "$turn" -le "$TURNS" ]]; do
    run_turn "$turn" "$SESSION_ID"
    turn="$((turn + 1))"
  done
fi

printf 'session: %s\n' "$SESSION_ID"

printf '\ncurrent session totals:\n'
opencode db --format json "
SELECT
  (SELECT SUM(length(data)) FROM message WHERE session_id = '$SESSION_ID') AS message_bytes,
  (SELECT SUM(length(data)) FROM part WHERE session_id = '$SESSION_ID') AS part_bytes,
  (SELECT COUNT(*) FROM message WHERE session_id = '$SESSION_ID') AS message_count,
  (SELECT COUNT(*) FROM part WHERE session_id = '$SESSION_ID') AS part_count;
"

printf '\nlargest message rows:\n'
opencode db --format json "
SELECT id, json_extract(data, '$.role') AS role, length(data) AS bytes
FROM message
WHERE session_id = '$SESSION_ID'
ORDER BY bytes DESC
LIMIT 10;
"

printf '\nlargest user messages:\n'
opencode db --format json "
SELECT id, length(data) AS bytes, json_array_length(json_extract(data, '$.summary.diffs')) AS diff_count
FROM message
WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
ORDER BY bytes DESC
LIMIT 10;
"

printf '\nall user diff payload bytes:\n'
opencode db --format json "
WITH user_messages AS (
  SELECT id, data
  FROM message
  WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
), sized AS (
  SELECT
    id,
    length(data) AS message_bytes,
    json_array_length(json_extract(data, '$.summary.diffs')) AS diff_count,
    (
      SELECT SUM(length(COALESCE(json_extract(value, '$.before'), '')) + length(COALESCE(json_extract(value, '$.after'), '')))
      FROM json_each(json_extract(user_messages.data, '$.summary.diffs'))
    ) AS diff_body_bytes
  FROM user_messages
)
SELECT *
FROM sized
ORDER BY message_bytes DESC;
"

printf '\ndiff payload inside largest user message:\n'
opencode db --format json "
WITH biggest AS (
  SELECT id, data
  FROM message
  WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
  ORDER BY length(data) DESC
  LIMIT 1
)
SELECT
  id,
  length(data) AS bytes,
  json_array_length(json_extract(data, '$.summary.diffs')) AS diff_count,
  (
    SELECT SUM(length(COALESCE(json_extract(value, '$.before'), '')) + length(COALESCE(json_extract(value, '$.after'), '')))
    FROM json_each(json_extract(biggest.data, '$.summary.diffs'))
  ) AS diff_body_bytes
FROM biggest;
"

printf '\nper-file diff sizes from largest user message:\n'
opencode db --format json "
WITH biggest AS (
  SELECT data
  FROM message
  WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
  ORDER BY length(data) DESC
  LIMIT 1
), diffs AS (
  SELECT
    json_extract(value, '$.file') AS file,
    length(COALESCE(json_extract(value, '$.before'), '')) + length(COALESCE(json_extract(value, '$.after'), '')) AS total_bytes
  FROM biggest, json_each(json_extract(biggest.data, '$.summary.diffs'))
)
SELECT file, total_bytes
FROM diffs
ORDER BY total_bytes DESC;
"

printf '\nmanual follow-up:\n'
printf '  open the session in tui: opencode --session %s\n' "$SESSION_ID"
printf '  workspace kept at: %s\n' "$ROOT"
printf '  artifacts kept at: %s\n' "$ARTIFACTS"

Screenshot and/or share link

No response

Operating System

macOS 26

Terminal

Ghostty

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcoreAnything pertaining to core functionality of the application (opencode server stuff)perfIndicates a performance issue or need for optimization

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions