fix: iii reconnect route loss + Voyage env config + heap calc against actual ceiling#610
Conversation
… actual ceiling
Three independent fixes discovered while running agentmemory at the Orbit/Devotel
project. Submitting as one PR because they all live in the same dist hot-path and
the regression diagnosis links them.
1. iii reconnect → /agentmemory/* routes lost (src/index.ts)
- When iii engine restarts, it unregisters all worker-registered HTTP routes
during its startup cleanup pass. iii-sdk's reconnect logic on the AM worker
side re-establishes the WebSocket but does NOT re-send the registerFunction
/registerTrigger payloads.
- Symptom: AM worker stays alive, but GET /agentmemory/livez returns 404
indefinitely. The @agentmemory/mcp shim's livez probe then fails forever
and the shim falls into permanent local-fallback mode (7-tool subset),
making the daemon look broken when only the routes are gone.
- Fix: wrap the register-everything block in a re-runnable
registerAllFunctions() closure + hook sdk.on("connection_state") to replay
it when the connection transitions from reconnecting back to connected.
bm25Index is hoisted out of the closure so IndexPersistence + restoreFrom
code below the closure can still see it.
2. Voyage env-configurable model + dimensions (src/providers/embedding/voyage.ts)
- VoyageEmbeddingProvider currently hardcodes model:"voyage-code-3" in the
embedBatch POST body, so VOYAGE_EMBEDDING_MODEL env is silently ignored.
- Fix: read model from getEnvVar("VOYAGE_EMBEDDING_MODEL") falling back to
voyage-code-3 (preserves default). Add VOYAGE_OUTPUT_DIMENSION env that
overrides the readonly dimensions field; when non-1024, attach
output_dimension to the API payload so Voyage 3 large can emit 2048-dim
embeddings.
3. memory_heap_tight calc uses heap ceiling, not heapTotal (src/health/thresholds.ts)
- evaluateHealth's memPercent = heapUsed/heapTotal hits 95%+ even when V8 has
5× headroom under --max-old-space-size, because heapTotal is V8's lazy
current allocation, not the configured ceiling. Result: chronic spurious
memory_heap_tight notes in healthy daemons.
- Fix: read cap from process.execArgv → NODE_OPTIONS env →
v8.getHeapStatistics().heap_size_limit (in that priority), fall back to
heapTotal preserving prior behavior when no cap is set. Now memPercent
reflects actual headroom.
Tested on agentmemory@0.9.21 dist bundle. Equivalent source patches here. All
three fixes verified by killing iii while AM worker stays alive → livez
returns 200 within the shim's 30s livez TTL cycle, no manual restart needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Someone is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThis PR updates memory health evaluation to detect effective V8 heap ceilings, refactors SDK registration into a re-runnable closure that reconnects on SDK state transitions, and makes embedding model and dimensions configurable via environment variables. ChangesMemory Health Threshold Updates
SDK Registration Lifecycle and Reconnection Handling
Voyage Embedding Provider Configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/providers/embedding/voyage.ts (2)
12-15: ⚡ Quick winConsider validating dimensions against supported values.
While the NaN check is critical, you may also want to validate that the parsed dimension is one of the supported values
{256, 512, 1024, 2048}mentioned in the removed comment. This would provide clearer error messages for misconfigured environments rather than deferring to API errors.♻️ Optional validation enhancement
readonly dimensions = (() => { const parsed = parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10); if (isNaN(parsed) || parsed <= 0) { throw new Error("VOYAGE_OUTPUT_DIMENSION must be a positive integer"); } + const supported = [256, 512, 1024, 2048]; + if (!supported.includes(parsed)) { + throw new Error( + `VOYAGE_OUTPUT_DIMENSION must be one of ${supported.join(", ")}; got ${parsed}` + ); + } return parsed; })();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/providers/embedding/voyage.ts` around lines 12 - 15, The parsed embedding dimension in the readonly property dimensions (using getEnvVar("VOYAGE_OUTPUT_DIMENSION")) should be validated against the supported set {256, 512, 1024, 2048}; after parsing the integer, check for NaN and then assert the value is one of those allowed values and throw or log a clear configuration error (including the invalid value and expected set) so misconfigurations are caught early rather than relying on downstream API errors.
9-10: ⚡ Quick winRemove WHAT comment in favor of clear naming.
The comment explains what the code does rather than why. The property names
modelanddimensionscombined with the env var names already make the behavior clear. As per coding guidelines, avoid comments explaining WHAT — use clear naming instead.♻️ Proposed removal
- // Respect VOYAGE_EMBEDDING_MODEL + VOYAGE_OUTPUT_DIMENSION env vars instead of hardcoding model + dims. - // Voyage 3 supports {256, 512, 1024, 2048} via output_dimension. Default behavior preserved when envs unset. readonly model = getEnvVar("VOYAGE_EMBEDDING_MODEL") || "voyage-code-3";🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/providers/embedding/voyage.ts` around lines 9 - 10, Remove the explanatory "what" comment above the Voyage embedding config and rely on the existing clear identifiers instead: delete the two-line comment that mentions VOYAGE_EMBEDDING_MODEL, VOYAGE_OUTPUT_DIMENSION, and supported dimensions; keep the property names model and dimensions and the existing env-var usage as-is (refer to the code in src/providers/embedding/voyage.ts that reads VOYAGE_EMBEDDING_MODEL and VOYAGE_OUTPUT_DIMENSION and assigns to model/dimensions).src/index.ts (1)
251-341: 💤 Low valueConsider extracting one-time boot logs outside the closure.
The
bootLogcalls insideregisterAllFunctions()(Claude bridge, knowledge graph, consolidation pipeline, auto-compress warnings, feature announcements, etc.) will repeat on every iii reconnection. While the "iii reconnected — re-registering" log at line 373 provides context, repeated configuration announcements may be confusing in logs.Consider moving one-time informational logs outside the closure, leaving only the registration calls inside.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/index.ts` around lines 251 - 341, The bootLog messages (calls to bootLog around Claude bridge, knowledge graph, consolidation pipeline, auto-compress, context injection, team memory, advanced retrieval/orchestration summaries, slots, and snapshots) are emitted inside the reconnect/registration closure and thus repeat on every iii reconnection; extract these one-time informational logs into a separate function (e.g., emitBootLogsOnce or bootLogStartupInfo) that reads the same flags/configs (loadClaudeBridgeConfig, isGraphExtractionEnabled, isConsolidationEnabled, isAutoCompressEnabled, isContextInjectionEnabled, loadTeamConfig, loadSnapshotConfig, isSlotsEnabled, isReflectEnabled) and call that function once at initial startup (or guard with a module-level boolean) while leaving all registerXxxFunction(...) calls inside the existing registerAllFunctions()/reconnection closure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/providers/embedding/voyage.ts`:
- Around line 12-15: The parsed dimensions field using
parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10) can be NaN for
invalid env values; update the initialization of the readonly dimensions
property in the same module to validate the result (Number.isInteger and >0) and
either fall back to a safe default (1024) or throw a clear error; reference the
getEnvVar call, the VOYAGE_OUTPUT_DIMENSION env var, and the dimensions property
so you locate and replace the parseInt usage with parsing + validation logic and
ensure downstream code relying on provider.dimensions never receives NaN.
---
Nitpick comments:
In `@src/index.ts`:
- Around line 251-341: The bootLog messages (calls to bootLog around Claude
bridge, knowledge graph, consolidation pipeline, auto-compress, context
injection, team memory, advanced retrieval/orchestration summaries, slots, and
snapshots) are emitted inside the reconnect/registration closure and thus repeat
on every iii reconnection; extract these one-time informational logs into a
separate function (e.g., emitBootLogsOnce or bootLogStartupInfo) that reads the
same flags/configs (loadClaudeBridgeConfig, isGraphExtractionEnabled,
isConsolidationEnabled, isAutoCompressEnabled, isContextInjectionEnabled,
loadTeamConfig, loadSnapshotConfig, isSlotsEnabled, isReflectEnabled) and call
that function once at initial startup (or guard with a module-level boolean)
while leaving all registerXxxFunction(...) calls inside the existing
registerAllFunctions()/reconnection closure.
In `@src/providers/embedding/voyage.ts`:
- Around line 12-15: The parsed embedding dimension in the readonly property
dimensions (using getEnvVar("VOYAGE_OUTPUT_DIMENSION")) should be validated
against the supported set {256, 512, 1024, 2048}; after parsing the integer,
check for NaN and then assert the value is one of those allowed values and throw
or log a clear configuration error (including the invalid value and expected
set) so misconfigurations are caught early rather than relying on downstream API
errors.
- Around line 9-10: Remove the explanatory "what" comment above the Voyage
embedding config and rely on the existing clear identifiers instead: delete the
two-line comment that mentions VOYAGE_EMBEDDING_MODEL, VOYAGE_OUTPUT_DIMENSION,
and supported dimensions; keep the property names model and dimensions and the
existing env-var usage as-is (refer to the code in
src/providers/embedding/voyage.ts that reads VOYAGE_EMBEDDING_MODEL and
VOYAGE_OUTPUT_DIMENSION and assigns to model/dimensions).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f07a2da5-3c07-4911-be18-a7e319e2159e
📒 Files selected for processing (3)
src/health/thresholds.tssrc/index.tssrc/providers/embedding/voyage.ts
| readonly dimensions = parseInt( | ||
| getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", | ||
| 10, | ||
| ); |
There was a problem hiding this comment.
Validate parsed dimensions to prevent silent NaN.
If VOYAGE_OUTPUT_DIMENSION contains invalid input (e.g., "abc", "1024px"), parseInt returns NaN. This would set provider.dimensions = NaN, causing all dimension checks to fail (v.length !== NaN is always true), breaking embeddings silently.
🛡️ Proposed fix with validation
- readonly dimensions = parseInt(
- getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024",
- 10,
- );
+ readonly dimensions = (() => {
+ const parsed = parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10);
+ if (isNaN(parsed) || parsed <= 0) {
+ throw new Error("VOYAGE_OUTPUT_DIMENSION must be a positive integer");
+ }
+ return parsed;
+ })();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| readonly dimensions = parseInt( | |
| getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", | |
| 10, | |
| ); | |
| readonly dimensions = (() => { | |
| const parsed = parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10); | |
| if (isNaN(parsed) || parsed <= 0) { | |
| throw new Error("VOYAGE_OUTPUT_DIMENSION must be a positive integer"); | |
| } | |
| return parsed; | |
| })(); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/providers/embedding/voyage.ts` around lines 12 - 15, The parsed
dimensions field using parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024",
10) can be NaN for invalid env values; update the initialization of the readonly
dimensions property in the same module to validate the result (Number.isInteger
and >0) and either fall back to a safe default (1024) or throw a clear error;
reference the getEnvVar call, the VOYAGE_OUTPUT_DIMENSION env var, and the
dimensions property so you locate and replace the parseInt usage with parsing +
validation logic and ensure downstream code relying on provider.dimensions never
receives NaN.
Summary
Three independent fixes discovered while running agentmemory at the Orbit/Devotel project. Submitting as one PR because they all live in the same dist hot-path and the regression diagnosis links them.
1. iii reconnect → /agentmemory/* routes lost (
src/index.ts)When iii engine restarts, it unregisters all worker-registered HTTP routes during its startup cleanup pass. iii-sdk's reconnect logic on the AM worker side re-establishes the WebSocket but does NOT re-send the
registerFunction/registerTriggerpayloads.Symptom: AM worker stays alive, but
GET /agentmemory/livezreturns 404 indefinitely. The@agentmemory/mcpshim's livez probe then fails forever and the shim falls into permanent local-fallback mode (7-tool subset), making the daemon look broken when only the routes are gone.Fix: wrap the register-everything block in a re-runnable
registerAllFunctions()closure + hooksdk.on("connection_state")to replay it when the connection transitions from reconnecting back to connected.bm25Indexis hoisted out of the closure soIndexPersistence+restoreFromcode below the closure can still see it.2. Voyage env-configurable model + dimensions (
src/providers/embedding/voyage.ts)VoyageEmbeddingProvidercurrently hardcodesmodel: "voyage-code-3"in theembedBatchPOST body, soVOYAGE_EMBEDDING_MODELenv is silently ignored.Fix: read model from
getEnvVar("VOYAGE_EMBEDDING_MODEL")falling back tovoyage-code-3(preserves default). AddVOYAGE_OUTPUT_DIMENSIONenv that overrides the readonly dimensions field; when non-1024, attachoutput_dimensionto the API payload so Voyage 3 large can emit 2048-dim embeddings.3. memory_heap_tight calc uses heap ceiling, not heapTotal (
src/health/thresholds.ts)evaluateHealth'smemPercent = heapUsed/heapTotalhits 95%+ even when V8 has 5× headroom under--max-old-space-size, because heapTotal is V8's lazy current allocation, not the configured ceiling. Result: chronic spuriousmemory_heap_tightnotes in healthy daemons.Fix: read cap from
process.execArgv→NODE_OPTIONSenv →v8.getHeapStatistics().heap_size_limit(in that priority), fall back to heapTotal preserving prior behavior when no cap is set. Now memPercent reflects actual headroom.Test plan
Discovered at
Devotel/Orbit CPaaS production daemon during an extended diagnostic session. Detailed root-cause analysis preserved in our internal audit docs:
Happy to add tests or split into 3 separate PRs if you'd prefer that — these were diagnosed together so submitting together felt natural.
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features