Skip to content

fix(router): consolidated config-reload memory-leak fix (#3006–#3009 bundle)#3020

Draft
asoorm wants to merge 4 commits into
mainfrom
fix/reload-memory-leak-consolidated
Draft

fix(router): consolidated config-reload memory-leak fix (#3006–#3009 bundle)#3020
asoorm wants to merge 4 commits into
mainfrom
fix/reload-memory-leak-consolidated

Conversation

@asoorm

@asoorm asoorm commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

Consolidated bundle of the four root-cause fixes for memory retention on router config hot-reload, combined onto one branch so the leak can be validated end-to-end in a single checkout.

ℹ️ Not a replacement. These commits are the originals authored in #3006, #3007, #3008, #3009. This PR exists purely to validate them together (and against the regression test in #3019). Review/merge the individual PRs as the source of truth.

Commits (cherry-picked, in merge order)

Commit Original PR
clear slowplancache entries on Close #3006
stop pinning schema AST in cached plan entries #3007
call OnRouterConfigReload before CDN/manifest hot-reload #3008
skip plan cache OnEvict during graphMux shutdown #3009

Root cause

On reload, graphMux.Shutdown()planCache.Close() triggers ristretto's Clear(), which fires the plan cache OnEvict for every entry, demoting the whole cache into planFallbackCache. That fallback is held by the long-lived reloadPersistentState (core/graph_server.go:1609), and slowplancache.Close() didn't clear its map — so the old plan cache + schema AST survived every reload. The fix set stops the demote-on-shutdown, clears the fallback on close, releases stale refs in the hot-reload path, and drops the unused schema-AST field.

Validation

Why draft

Coordination/validation branch for #3006#3009 — not intended to merge on its own.

Summary by CodeRabbit

  • Bug Fixes

    • Improved shutdown behavior so cached plans are released cleanly and no longer trigger delayed cache work during reload or shutdown.
    • Reduced the risk of stale cache entries lingering after a router configuration refresh.
  • Tests

    • Added coverage to verify cache shutdown fully clears stored entries and resets cache state.

… values

After Close(), cached entries in the sync.Map could keep references to
stored values (including query plans holding schema AST pointers) until
the Cache struct itself was collected. Clear entries when shutting down
the cache so config reload can release the previous graph generation.
planWithMetaData stored schemaDocument (full router schema AST, ~200MB)
in every cached plan entry but never read it. Drop the field so plan
caches do not retain the old schema generation after config reload.
The supervisor restart path already extracts slow-plan cache entries
before replacing the graph server. Call the same hook at the start of
newServer() so CDN and manifest hot-reloads release stale plan cache
references while the old graphMux is still alive.
Ristretto Close() clears all plan cache entries and fires OnEvict for
each one, migrating them into slowplancache right before both caches
are torn down. Disable OnEvict during intentional mux shutdown and wait
for pending slow-cache writes first.
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2fd2f7a9-4cb4-470f-a2e5-ef56c609dd45

📥 Commits

Reviewing files that changed from the base of the PR and between 2226f3f and 38c42fc.

📒 Files selected for processing (5)
  • router/core/graph_server.go
  • router/core/operation_planner.go
  • router/core/router.go
  • router/pkg/slowplancache/slow_plan_cache.go
  • router/pkg/slowplancache/slow_plan_cache_test.go

Walkthrough

The router updates internal plan metadata, reload timing, eviction handling during shutdown, and slow plan cache teardown. A new test covers the cache close behavior.

Changes

Plan cache lifecycle updates

Layer / File(s) Summary
Internal plan metadata shape
router/core/operation_planner.go
planWithMetaData stores preparedPlan and operationDocument, and planOperation stops populating the removed schema-related field.
Config reload and plan-cache shutdown sequencing
router/core/router.go, router/core/graph_server.go
Router.newServer calls OnRouterConfigReload() before replacing the graph server, and graphMux gates plan-cache eviction writes during shutdown before waiting on planFallbackCache.
Slow plan cache teardown
router/pkg/slowplancache/slow_plan_cache.go, router/pkg/slowplancache/slow_plan_cache_test.go
Cache.Close() deletes all cached entries, resets size to 0, and the new test checks that close empties the cache state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the bundled router config-reload memory-leak fixes and reflects the main change set.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-9b52b259e567185dfdb5c1c0491b385c641b41d5

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 16.66667% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.03%. Comparing base (5efb3c2) to head (38c42fc).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
router/core/graph_server.go 16.66% 4 Missing and 1 partial ⚠️
router/pkg/slowplancache/slow_plan_cache.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3020      +/-   ##
==========================================
- Coverage   60.27%   51.03%   -9.25%     
==========================================
  Files         477      243     -234     
  Lines       62471    29475   -32996     
  Branches     6131        0    -6131     
==========================================
- Hits        37656    15042   -22614     
+ Misses      24787    12903   -11884     
- Partials       28     1530    +1502     
Files with missing lines Coverage Δ
router/core/operation_planner.go 57.77% <ø> (ø)
router/core/router.go 63.98% <100.00%> (ø)
router/core/graph_server.go 76.25% <16.66%> (ø)
router/pkg/slowplancache/slow_plan_cache.go 5.76% <0.00%> (ø)

... and 716 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants