fix: #3346 AdvancedSQLiteSession.delete_branch orphaned base rows#3380
fix: #3346 AdvancedSQLiteSession.delete_branch orphaned base rows#3380john-rocky wants to merge 1 commit into
Conversation
`delete_branch` removed `turn_usage` and `message_structure` rows for the branch but never touched the underlying messages table. When a branch was the only reference for a given message (typical for branch-only follow-ups added after `create_branch_from_turn`), the base row stayed behind: invisible to advanced reads, which join through `message_structure`, but still padding the on-disk size and clouding any direct audit of the messages table. After the structure rows are deleted, drop any base rows that no longer have a matching `message_structure` entry inside the same transaction by reusing the existing `_cleanup_orphaned_messages_sync` helper. Messages still referenced by another branch (e.g. shared turns) are left alone. The log line now reports the orphan count alongside the existing usage and structure counts. Fixes openai#3346.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98393785e7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # reference for. Without this step, branch-only rows become | ||
| # invisible to advanced reads (which join through | ||
| # `message_structure`) but linger in the base table. | ||
| orphans_deleted = self._cleanup_orphaned_messages_sync(conn) |
There was a problem hiding this comment.
Avoid single-statement cleanup for large branch deletes
When deleting a branch with more branch-only messages than SQLite's host-parameter limit, this new call feeds every orphan id into _cleanup_orphaned_messages_sync, which builds one DELETE ... WHERE id IN (?, ...) statement. On SQLite builds with lower variable limits this raises OperationalError: too many SQL variables, so large branches cannot be deleted; chunk the cleanup or use a set-based DELETE ... NOT EXISTS query instead.
Useful? React with 👍 / 👎.
Summary
AdvancedSQLiteSession.delete_branch()deletedturn_usageandmessage_structurerows for the branch, but never touched the underlying messages table. Any base row whose onlymessage_structurereference was on the deleted branch — the common case for follow-ups added aftercreate_branch_from_turn— stayed behind: invisible to advanced reads (which join throughmessage_structure) but still padding the on-disk size, which made branch-heavy workflows accumulate ghost rows.After the structure rows are deleted, drop any base rows that no longer have a matching
message_structureentry, inside the same transaction. The existing_cleanup_orphaned_messages_synchelper does exactly that, so the change is minimal and matches the cleanup pattern already used elsewhere in this module. Messages still referenced by another branch (e.g. shared turns from before the split) are left alone. The log line now also reports the orphan count so operators can spot unexpected cleanup volumes.Test plan
Added
test_delete_branch_removes_branch_only_base_messages. It reproduces the issue script: create a main turn, branch from it, append branch-only messages, thendelete_branch(..., force=True). After the call, the test asserts that:mainstructure rows remain,get_items(branch_id="main")returns the original main payload.Issue number
Closes #3346. Adjacent #3348 (silent partial save in
add_items) is fixed by a separate PR and not touched here.Checks