Skip to content

fix(router): resolve plan cache memory leaks on config reload#3023

Open
dkorittki wants to merge 6 commits into
mainfrom
dominik/eng-9822-fix-plancache-related-leaks
Open

fix(router): resolve plan cache memory leaks on config reload#3023
dkorittki wants to merge 6 commits into
mainfrom
dominik/eng-9822-fix-plancache-related-leaks

Conversation

@dkorittki

@dkorittki dkorittki commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Combines #3006, #3007, and #3009 with the following additional changes:

  • Do not wait for pending slow-cache writes on mux shutdown, cache is about to be closed anyways
  • Reset minKey and minDur fields when closing the slowplancache

#3008 currently not included here, needs some more work. Will be merged separetely.

Summary by CodeRabbit

  • Bug Fixes
    • Improved plan cache shutdown so cached entries are fully released and internal tracking is reset.
    • Prevented cache eviction handling from writing entries into fallback storage during router shutdown.
    • Cleaned up plan metadata handling to return only the data that is actually used.

Checklist

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

arutkowski00 and others added 6 commits June 26, 2026 17:32
… values

After Close(), cached entries in the sync.Map could keep references to
stored values (including query plans holding schema AST pointers) until
the Cache struct itself was collected. Clear entries when shutting down
the cache so config reload can release the previous graph generation.
Ristretto Close() clears all plan cache entries and fires OnEvict for
each one, migrating them into slowplancache right before both caches
are torn down. Disable OnEvict during intentional mux shutdown and wait
for pending slow-cache writes first.
planWithMetaData stored schemaDocument (full router schema AST, ~200MB)
in every cached plan entry but never read it. Drop the field so plan
caches do not retain the old schema generation after config reload.
@dkorittki dkorittki requested a review from a team as a code owner June 26, 2026 15:46
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The PR updates planner metadata, changes slow-plan-cache shutdown to drop stored entries and reset bookkeeping, and adds a shutdown gate around plan-cache eviction handling in the router.

Changes

Plan cache lifecycle and planner metadata

Layer / File(s) Summary
Planner metadata cleanup
router/core/operation_planner.go
planWithMetaData removes schemaDocument, and planOperation returns the remaining metadata fields.
Slow plan cache close cleanup
router/pkg/slowplancache/slow_plan_cache.go, router/pkg/slowplancache/slow_plan_cache_test.go
Cache.Close() deletes stored entries, resets cache bookkeeping, and the new test checks the cache is empty afterward.
Plan-cache eviction hook gating
router/core/graph_server.go
graphMux adds an atomic hook flag, enables it for warmup with in-memory fallback, wraps OnEvict with the flag check, and disables it before shutdown closes the plan cache.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • wundergraph/cosmo#2838: Also changes graphMux.Shutdown behavior in router/core/graph_server.go, which is directly related to the shutdown path updated here.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing router plan cache memory leaks during config reload.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-9869fb2e19cc772d564771f6dcaa050705d42792-nonroot

@github-actions

Copy link
Copy Markdown

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-9869fb2e19cc772d564771f6dcaa050705d42792

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.42%. Comparing base (f144989) to head (b4f4018).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3023       +/-   ##
===========================================
+ Coverage   46.53%   61.42%   +14.88%     
===========================================
  Files        1120      259      -861     
  Lines      154966    30055   -124911     
  Branches    10060        0    -10060     
===========================================
- Hits        72107    18460    -53647     
+ Misses      81025    10107    -70918     
+ Partials     1834     1488      -346     
Files with missing lines Coverage Δ
router/core/graph_server.go 85.70% <100.00%> (+0.04%) ⬆️
router/core/operation_planner.go 71.11% <ø> (-0.32%) ⬇️
router/pkg/slowplancache/slow_plan_cache.go 99.05% <100.00%> (+0.05%) ⬆️

... and 863 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dkorittki

Copy link
Copy Markdown
Contributor Author

@SkArchon can you take a look at this?

@SkArchon SkArchon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it locally as well, lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants