-
Notifications
You must be signed in to change notification settings - Fork 12.5k
Description
Problem
Some sessions make OpenCode use multiple GB of memory.
If a session touches large generated text files across turns, memory can blow up while the session is being summarized, even if the visible conversation stays small.
I don't mind snapshots using disk space. The issue is loading full contents of changed generated files into active memory.
Real condition where this happens
This happens in ordinary repositories that generate large text artifacts while the agent is working, for example:
- build systems
- linker map files
- generated logs / manifests / diagnostic outputs
- other large machine-written text artifacts
If those files are rewritten across turns in the same session, a large enough artifact can turn one diff reconstruction into a multi-GB memory event.
Root cause
The root cause is eager full-body snapshot diff generation in Snapshot.diffFull().
What happens:
- Session parts store snapshot hashes (
step-start/step-finish). - When session summarization runs, OpenCode rebuilds diff summaries from those snapshot hashes.
Snapshot.diffFull(from, to)compares the two snapshot trees.- For each changed file, it loads the full old body and full new body into memory as
beforeandafterstrings. - If one changed file is huge, that single diff can materialize hundreds of MB or more on each side.
- That is enough to turn one diff reconstruction into a multi-GB memory event.
This eager diff body loading later also shows up as oversized data in stored session summaries under:
summary.diffs[*].beforesummary.diffs[*].after
Steps to reproduce
Run the script below.
What the reproduction does:
- creates a fresh temp git repo
- continues the same session for a few turns
- rewrites large generated files each turn
- forces
snapshot: trueso global config does not disable the repro - then prints the generated session ID
Open that generated session:
opencode --session <session-id>#!/usr/bin/env bash
set -euo pipefail
need() {
command -v "$1" >/dev/null 2>&1 || {
printf 'missing required command: %s\n' "$1" >&2
exit 1
}
}
need git
need opencode
need python3
ROOT="$(mktemp -d "${TMPDIR:-/tmp}/opencode-summary-repro.XXXXXX")"
ARTIFACTS="$ROOT-artifacts"
TITLE="summary-diff-repro-$(date +%s)"
PROMPT_FILE="$ARTIFACTS/prompt.txt"
OUTPUT_FILE="$ARTIFACTS/opencode-run.jsonl"
TURNS="${TURNS:-1}"
BIG1_MIB="${BIG1_MIB:-128}"
BIG2_MIB="${BIG2_MIB:-96}"
SMALL_KIB="${SMALL_KIB:-512}"
cleanup() {
if [[ "${KEEP_TMP:-1}" != "1" ]]; then
rm -rf "$ROOT" "$ARTIFACTS"
fi
}
trap cleanup EXIT
printf 'workspace: %s\n' "$ROOT"
printf 'artifacts: %s\n' "$ARTIFACTS"
mkdir -p "$ARTIFACTS"
git -C "$ROOT" init -q
git -C "$ROOT" config user.name "bug-repro"
git -C "$ROOT" config user.email "bug-repro@example.com"
mkdir -p "$ROOT/src"
printf 'hello\n' > "$ROOT/src/main.txt"
git -C "$ROOT" add .
git -C "$ROOT" commit -qm "init"
write_prompt() {
local turn="$1"
cat > "$PROMPT_FILE" <<EOF
Use bash only.
Do not ask questions.
Do not use write, edit, or apply_patch.
Do not inspect the repo first.
Do not run ls, read, glob, grep, or verification commands.
Use exactly one bash tool call for the whole task.
In the current git repo, run shell commands that do all of the following:
1. Rewrite \
\
- build/CMakeFiles/CMakeConfigureLog.yaml to exactly ${BIG1_MIB} MiB of deterministic ASCII text for turn ${turn}
- build/build.ninja to exactly ${BIG2_MIB} MiB of deterministic ASCII text for turn ${turn}
- build/.ninja_log to exactly ${SMALL_KIB} KiB of deterministic ASCII text for turn ${turn}
2. Append one line mentioning turn ${turn} to src/main.txt.
3. Print a short confirmation.
Use deterministic generated text, for example with python3.
Create parent directories inside that single bash command.
Do not delete the generated files.
Stop when finished.
EOF
}
run_turn() {
local turn="$1"
local session_id="${2:-}"
write_prompt "$turn"
printf '\nturn %s/%s...\n' "$turn" "$TURNS"
if [[ -n "$session_id" ]]; then
if [[ -n "${OPENCODE_MODEL:-}" ]]; then
opencode run \
--dir "$ROOT" \
--session "$session_id" \
--format json \
--model "$OPENCODE_MODEL" \
"$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
else
opencode run \
--dir "$ROOT" \
--session "$session_id" \
--format json \
"$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
fi
return
fi
if [[ -n "${OPENCODE_MODEL:-}" ]]; then
opencode run \
--dir "$ROOT" \
--title "$TITLE" \
--format json \
--model "$OPENCODE_MODEL" \
"$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
else
opencode run \
--dir "$ROOT" \
--title "$TITLE" \
--format json \
"$(<"$PROMPT_FILE")" | tee -a "$OUTPUT_FILE"
fi
}
lookup_session_id() {
python3 - "$OUTPUT_FILE" <<'PY'
import json
import sys
for line in open(sys.argv[1], "r", encoding="utf-8"):
line = line.strip()
if not line:
continue
try:
obj = json.loads(line)
except Exception:
continue
session_id = obj.get("sessionID")
if session_id:
print(session_id)
raise SystemExit(0)
PY
}
if [[ -z "${OPENCODE_PERMISSION:-}" ]]; then
export OPENCODE_PERMISSION='{"bash":"allow","read":"allow","list":"allow","glob":"allow","grep":"allow","edit":"allow","task":"allow","todowrite":"allow","question":"deny"}'
fi
if [[ -z "${OPENCODE_CONFIG_CONTENT:-}" ]]; then
export OPENCODE_CONFIG_CONTENT='{"$schema":"https://opencode.ai/config.json","snapshot":true}'
fi
: > "$OUTPUT_FILE"
printf 'running opencode...\n'
run_turn 1
DB_PATH="$(opencode db path)"
printf 'database: %s\n' "$DB_PATH"
SESSION_ID="$(lookup_session_id | tr -d '\n')"
if [[ -z "$SESSION_ID" ]]; then
SESSION_ID="$(opencode db --format json "SELECT id FROM session WHERE title = '$TITLE' ORDER BY time_created DESC LIMIT 1;" | python3 -c 'import json,sys; rows=json.load(sys.stdin); print(rows[0]["id"] if rows else "")' | tr -d '\n')"
fi
if [[ -z "$SESSION_ID" ]]; then
printf 'failed to determine session id from opencode output or database lookup\n' >&2
exit 1
fi
if [[ "$TURNS" -gt 1 ]]; then
turn=2
while [[ "$turn" -le "$TURNS" ]]; do
run_turn "$turn" "$SESSION_ID"
turn="$((turn + 1))"
done
fi
printf 'session: %s\n' "$SESSION_ID"
printf '\ncurrent session totals:\n'
opencode db --format json "
SELECT
(SELECT SUM(length(data)) FROM message WHERE session_id = '$SESSION_ID') AS message_bytes,
(SELECT SUM(length(data)) FROM part WHERE session_id = '$SESSION_ID') AS part_bytes,
(SELECT COUNT(*) FROM message WHERE session_id = '$SESSION_ID') AS message_count,
(SELECT COUNT(*) FROM part WHERE session_id = '$SESSION_ID') AS part_count;
"
printf '\nlargest message rows:\n'
opencode db --format json "
SELECT id, json_extract(data, '$.role') AS role, length(data) AS bytes
FROM message
WHERE session_id = '$SESSION_ID'
ORDER BY bytes DESC
LIMIT 10;
"
printf '\nlargest user messages:\n'
opencode db --format json "
SELECT id, length(data) AS bytes, json_array_length(json_extract(data, '$.summary.diffs')) AS diff_count
FROM message
WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
ORDER BY bytes DESC
LIMIT 10;
"
printf '\nall user diff payload bytes:\n'
opencode db --format json "
WITH user_messages AS (
SELECT id, data
FROM message
WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
), sized AS (
SELECT
id,
length(data) AS message_bytes,
json_array_length(json_extract(data, '$.summary.diffs')) AS diff_count,
(
SELECT SUM(length(COALESCE(json_extract(value, '$.before'), '')) + length(COALESCE(json_extract(value, '$.after'), '')))
FROM json_each(json_extract(user_messages.data, '$.summary.diffs'))
) AS diff_body_bytes
FROM user_messages
)
SELECT *
FROM sized
ORDER BY message_bytes DESC;
"
printf '\ndiff payload inside largest user message:\n'
opencode db --format json "
WITH biggest AS (
SELECT id, data
FROM message
WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
ORDER BY length(data) DESC
LIMIT 1
)
SELECT
id,
length(data) AS bytes,
json_array_length(json_extract(data, '$.summary.diffs')) AS diff_count,
(
SELECT SUM(length(COALESCE(json_extract(value, '$.before'), '')) + length(COALESCE(json_extract(value, '$.after'), '')))
FROM json_each(json_extract(biggest.data, '$.summary.diffs'))
) AS diff_body_bytes
FROM biggest;
"
printf '\nper-file diff sizes from largest user message:\n'
opencode db --format json "
WITH biggest AS (
SELECT data
FROM message
WHERE session_id = '$SESSION_ID' AND json_extract(data, '$.role') = 'user'
ORDER BY length(data) DESC
LIMIT 1
), diffs AS (
SELECT
json_extract(value, '$.file') AS file,
length(COALESCE(json_extract(value, '$.before'), '')) + length(COALESCE(json_extract(value, '$.after'), '')) AS total_bytes
FROM biggest, json_each(json_extract(biggest.data, '$.summary.diffs'))
)
SELECT file, total_bytes
FROM diffs
ORDER BY total_bytes DESC;
"
printf '\nmanual follow-up:\n'
printf ' open the session in tui: opencode --session %s\n' "$SESSION_ID"
printf ' workspace kept at: %s\n' "$ROOT"
printf ' artifacts kept at: %s\n' "$ARTIFACTS"Screenshot and/or share link
No response
Operating System
macOS 26
Terminal
Ghostty