Skip to content

fix: iii reconnect route loss + Voyage env config + heap calc against actual ceiling#610

Open
dddFEDDDDDd wants to merge 1 commit into
rohitg00:mainfrom
dddFEDDDDDd:orbit/iii-reconnect-replay-voyage-env-heap-cap
Open

fix: iii reconnect route loss + Voyage env config + heap calc against actual ceiling#610
dddFEDDDDDd wants to merge 1 commit into
rohitg00:mainfrom
dddFEDDDDDd:orbit/iii-reconnect-replay-voyage-env-heap-cap

Conversation

@dddFEDDDDDd
Copy link
Copy Markdown

@dddFEDDDDDd dddFEDDDDDd commented May 22, 2026

Summary

Three independent fixes discovered while running agentmemory at the Orbit/Devotel project. Submitting as one PR because they all live in the same dist hot-path and the regression diagnosis links them.

1. iii reconnect → /agentmemory/* routes lost (src/index.ts)

When iii engine restarts, it unregisters all worker-registered HTTP routes during its startup cleanup pass. iii-sdk's reconnect logic on the AM worker side re-establishes the WebSocket but does NOT re-send the registerFunction/registerTrigger payloads.

Symptom: AM worker stays alive, but GET /agentmemory/livez returns 404 indefinitely. The @agentmemory/mcp shim's livez probe then fails forever and the shim falls into permanent local-fallback mode (7-tool subset), making the daemon look broken when only the routes are gone.

Fix: wrap the register-everything block in a re-runnable registerAllFunctions() closure + hook sdk.on("connection_state") to replay it when the connection transitions from reconnecting back to connected. bm25Index is hoisted out of the closure so IndexPersistence + restoreFrom code below the closure can still see it.

2. Voyage env-configurable model + dimensions (src/providers/embedding/voyage.ts)

VoyageEmbeddingProvider currently hardcodes model: "voyage-code-3" in the embedBatch POST body, so VOYAGE_EMBEDDING_MODEL env is silently ignored.

Fix: read model from getEnvVar("VOYAGE_EMBEDDING_MODEL") falling back to voyage-code-3 (preserves default). Add VOYAGE_OUTPUT_DIMENSION env that overrides the readonly dimensions field; when non-1024, attach output_dimension to the API payload so Voyage 3 large can emit 2048-dim embeddings.

3. memory_heap_tight calc uses heap ceiling, not heapTotal (src/health/thresholds.ts)

evaluateHealth's memPercent = heapUsed/heapTotal hits 95%+ even when V8 has 5× headroom under --max-old-space-size, because heapTotal is V8's lazy current allocation, not the configured ceiling. Result: chronic spurious memory_heap_tight notes in healthy daemons.

Fix: read cap from process.execArgvNODE_OPTIONS env → v8.getHeapStatistics().heap_size_limit (in that priority), fall back to heapTotal preserving prior behavior when no cap is set. Now memPercent reflects actual headroom.

Test plan

  • Patches applied to installed v0.9.21 dist bundle in production-like setup
  • Killed iii daemon while AM worker stays alive → livez returns 200 within shim's 30s livez TTL cycle, no manual AM restart needed
  • Set VOYAGE_EMBEDDING_MODEL=voyage-3-large + VOYAGE_OUTPUT_DIMENSION=2048 → 430 old 1024-dim vectors discarded via AGENTMEMORY_DROP_STALE_INDEX, new embeds at 2048-dim
  • heap calc: notes:[] empty at heapUsed=17% of 512MB cap (was firing at 95%+ against heapTotal pre-patch)
  • Upstream CI / lint
  • Add a regression test (would need to instrument iii-sdk reconnect in test harness)

Discovered at

Devotel/Orbit CPaaS production daemon during an extended diagnostic session. Detailed root-cause analysis preserved in our internal audit docs:

  • iii-reconnect symptom diagnosis
  • Voyage hardcoded-model discovery
  • Heap calc misleading-warning analysis

Happy to add tests or split into 3 separate PRs if you'd prefer that — these were diagnosed together so submitting together felt natural.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Memory-based alerts now compute pressure using effective V8 heap ceiling for improved accuracy.
  • New Features

    • Agent memory routes now survive service restarts via automatic re-registration on reconnection.
    • Embedding model and dimensions are now configurable via environment variables.

Review Change Stack

… actual ceiling

Three independent fixes discovered while running agentmemory at the Orbit/Devotel
project. Submitting as one PR because they all live in the same dist hot-path and
the regression diagnosis links them.

1. iii reconnect → /agentmemory/* routes lost (src/index.ts)
   - When iii engine restarts, it unregisters all worker-registered HTTP routes
     during its startup cleanup pass. iii-sdk's reconnect logic on the AM worker
     side re-establishes the WebSocket but does NOT re-send the registerFunction
     /registerTrigger payloads.
   - Symptom: AM worker stays alive, but GET /agentmemory/livez returns 404
     indefinitely. The @agentmemory/mcp shim's livez probe then fails forever
     and the shim falls into permanent local-fallback mode (7-tool subset),
     making the daemon look broken when only the routes are gone.
   - Fix: wrap the register-everything block in a re-runnable
     registerAllFunctions() closure + hook sdk.on("connection_state") to replay
     it when the connection transitions from reconnecting back to connected.
     bm25Index is hoisted out of the closure so IndexPersistence + restoreFrom
     code below the closure can still see it.

2. Voyage env-configurable model + dimensions (src/providers/embedding/voyage.ts)
   - VoyageEmbeddingProvider currently hardcodes model:"voyage-code-3" in the
     embedBatch POST body, so VOYAGE_EMBEDDING_MODEL env is silently ignored.
   - Fix: read model from getEnvVar("VOYAGE_EMBEDDING_MODEL") falling back to
     voyage-code-3 (preserves default). Add VOYAGE_OUTPUT_DIMENSION env that
     overrides the readonly dimensions field; when non-1024, attach
     output_dimension to the API payload so Voyage 3 large can emit 2048-dim
     embeddings.

3. memory_heap_tight calc uses heap ceiling, not heapTotal (src/health/thresholds.ts)
   - evaluateHealth's memPercent = heapUsed/heapTotal hits 95%+ even when V8 has
     5× headroom under --max-old-space-size, because heapTotal is V8's lazy
     current allocation, not the configured ceiling. Result: chronic spurious
     memory_heap_tight notes in healthy daemons.
   - Fix: read cap from process.execArgv → NODE_OPTIONS env →
     v8.getHeapStatistics().heap_size_limit (in that priority), fall back to
     heapTotal preserving prior behavior when no cap is set. Now memPercent
     reflects actual headroom.

Tested on agentmemory@0.9.21 dist bundle. Equivalent source patches here. All
three fixes verified by killing iii while AM worker stays alive → livez
returns 200 within the shim's 30s livez TTL cycle, no manual restart needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 22, 2026

Someone is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

This PR updates memory health evaluation to detect effective V8 heap ceilings, refactors SDK registration into a re-runnable closure that reconnects on SDK state transitions, and makes embedding model and dimensions configurable via environment variables.

Changes

Memory Health Threshold Updates

Layer / File(s) Summary
V8 heap ceiling detection for memory pressure
src/health/thresholds.ts
evaluateHealth now imports v8.getHeapStatistics and replaces heapTotal-based memory calculation with fallback logic: read --max-old-space-size from process.execArgv, fall back to NODE_OPTIONS, then v8.getHeapStatistics().heap_size_limit. Return type annotation is reformatted to multi-line object syntax.

SDK Registration Lifecycle and Reconnection Handling

Layer / File(s) Summary
Registration closure and BM25 hoisting
src/index.ts
All function registrations and search wiring are extracted into a registerAllFunctions() closure that can be replayed on reconnection. The BM25 index is hoisted outside the closure to persist search state across SDK reconnections. Metrics initialization is wired with conditional meter accessor.
Reconnection listener for route persistence
src/index.ts
An sdk.on("connection_state") listener replays registerAllFunctions() when the SDK transitions from non-connected to "connected", guarded against repeated invocation and wrapped in try/catch with stderr logging, ensuring /agentmemory/* routes survive restarts.
Logging and SDK trigger payload formatting
src/index.ts
Boot logs (Provider, REST API, BM25 index, vector index), import destructuring, error logs, and SDK trigger calls are reformatted into multiline payload objects; distinct dimensions string and consolidation boot log are reflowed. Behavior is unchanged; consolidation and decay-sweep triggers now use explicit multiline payload objects.

Voyage Embedding Provider Configuration

Layer / File(s) Summary
Dynamic embedding model and dimensions configuration
src/providers/embedding/voyage.ts
VoyageEmbeddingProvider initializes model from VOYAGE_EMBEDDING_MODEL (default "voyage-code-3") and dimensions from VOYAGE_OUTPUT_DIMENSION (default 1024). The embedBatch request builder uses these dynamic properties and only includes output_dimension in the request when dimensions differ from 1024.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through threads so new,
With heap ceilings measured true,
Registrations replay when code reconnects,
Embeddings now take directives they respect—
The agent remembers, resilient and bright! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main fixes in the changeset: iii reconnect route restoration, Voyage environment configuration, and heap calculation against actual ceiling.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/providers/embedding/voyage.ts (2)

12-15: ⚡ Quick win

Consider validating dimensions against supported values.

While the NaN check is critical, you may also want to validate that the parsed dimension is one of the supported values {256, 512, 1024, 2048} mentioned in the removed comment. This would provide clearer error messages for misconfigured environments rather than deferring to API errors.

♻️ Optional validation enhancement
   readonly dimensions = (() => {
     const parsed = parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10);
     if (isNaN(parsed) || parsed <= 0) {
       throw new Error("VOYAGE_OUTPUT_DIMENSION must be a positive integer");
     }
+    const supported = [256, 512, 1024, 2048];
+    if (!supported.includes(parsed)) {
+      throw new Error(
+        `VOYAGE_OUTPUT_DIMENSION must be one of ${supported.join(", ")}; got ${parsed}`
+      );
+    }
     return parsed;
   })();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/voyage.ts` around lines 12 - 15, The parsed embedding
dimension in the readonly property dimensions (using
getEnvVar("VOYAGE_OUTPUT_DIMENSION")) should be validated against the supported
set {256, 512, 1024, 2048}; after parsing the integer, check for NaN and then
assert the value is one of those allowed values and throw or log a clear
configuration error (including the invalid value and expected set) so
misconfigurations are caught early rather than relying on downstream API errors.

9-10: ⚡ Quick win

Remove WHAT comment in favor of clear naming.

The comment explains what the code does rather than why. The property names model and dimensions combined with the env var names already make the behavior clear. As per coding guidelines, avoid comments explaining WHAT — use clear naming instead.

♻️ Proposed removal
-  // Respect VOYAGE_EMBEDDING_MODEL + VOYAGE_OUTPUT_DIMENSION env vars instead of hardcoding model + dims.
-  // Voyage 3 supports {256, 512, 1024, 2048} via output_dimension. Default behavior preserved when envs unset.
   readonly model = getEnvVar("VOYAGE_EMBEDDING_MODEL") || "voyage-code-3";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/voyage.ts` around lines 9 - 10, Remove the
explanatory "what" comment above the Voyage embedding config and rely on the
existing clear identifiers instead: delete the two-line comment that mentions
VOYAGE_EMBEDDING_MODEL, VOYAGE_OUTPUT_DIMENSION, and supported dimensions; keep
the property names model and dimensions and the existing env-var usage as-is
(refer to the code in src/providers/embedding/voyage.ts that reads
VOYAGE_EMBEDDING_MODEL and VOYAGE_OUTPUT_DIMENSION and assigns to
model/dimensions).
src/index.ts (1)

251-341: 💤 Low value

Consider extracting one-time boot logs outside the closure.

The bootLog calls inside registerAllFunctions() (Claude bridge, knowledge graph, consolidation pipeline, auto-compress warnings, feature announcements, etc.) will repeat on every iii reconnection. While the "iii reconnected — re-registering" log at line 373 provides context, repeated configuration announcements may be confusing in logs.

Consider moving one-time informational logs outside the closure, leaving only the registration calls inside.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/index.ts` around lines 251 - 341, The bootLog messages (calls to bootLog
around Claude bridge, knowledge graph, consolidation pipeline, auto-compress,
context injection, team memory, advanced retrieval/orchestration summaries,
slots, and snapshots) are emitted inside the reconnect/registration closure and
thus repeat on every iii reconnection; extract these one-time informational logs
into a separate function (e.g., emitBootLogsOnce or bootLogStartupInfo) that
reads the same flags/configs (loadClaudeBridgeConfig, isGraphExtractionEnabled,
isConsolidationEnabled, isAutoCompressEnabled, isContextInjectionEnabled,
loadTeamConfig, loadSnapshotConfig, isSlotsEnabled, isReflectEnabled) and call
that function once at initial startup (or guard with a module-level boolean)
while leaving all registerXxxFunction(...) calls inside the existing
registerAllFunctions()/reconnection closure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/providers/embedding/voyage.ts`:
- Around line 12-15: The parsed dimensions field using
parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10) can be NaN for
invalid env values; update the initialization of the readonly dimensions
property in the same module to validate the result (Number.isInteger and >0) and
either fall back to a safe default (1024) or throw a clear error; reference the
getEnvVar call, the VOYAGE_OUTPUT_DIMENSION env var, and the dimensions property
so you locate and replace the parseInt usage with parsing + validation logic and
ensure downstream code relying on provider.dimensions never receives NaN.

---

Nitpick comments:
In `@src/index.ts`:
- Around line 251-341: The bootLog messages (calls to bootLog around Claude
bridge, knowledge graph, consolidation pipeline, auto-compress, context
injection, team memory, advanced retrieval/orchestration summaries, slots, and
snapshots) are emitted inside the reconnect/registration closure and thus repeat
on every iii reconnection; extract these one-time informational logs into a
separate function (e.g., emitBootLogsOnce or bootLogStartupInfo) that reads the
same flags/configs (loadClaudeBridgeConfig, isGraphExtractionEnabled,
isConsolidationEnabled, isAutoCompressEnabled, isContextInjectionEnabled,
loadTeamConfig, loadSnapshotConfig, isSlotsEnabled, isReflectEnabled) and call
that function once at initial startup (or guard with a module-level boolean)
while leaving all registerXxxFunction(...) calls inside the existing
registerAllFunctions()/reconnection closure.

In `@src/providers/embedding/voyage.ts`:
- Around line 12-15: The parsed embedding dimension in the readonly property
dimensions (using getEnvVar("VOYAGE_OUTPUT_DIMENSION")) should be validated
against the supported set {256, 512, 1024, 2048}; after parsing the integer,
check for NaN and then assert the value is one of those allowed values and throw
or log a clear configuration error (including the invalid value and expected
set) so misconfigurations are caught early rather than relying on downstream API
errors.
- Around line 9-10: Remove the explanatory "what" comment above the Voyage
embedding config and rely on the existing clear identifiers instead: delete the
two-line comment that mentions VOYAGE_EMBEDDING_MODEL, VOYAGE_OUTPUT_DIMENSION,
and supported dimensions; keep the property names model and dimensions and the
existing env-var usage as-is (refer to the code in
src/providers/embedding/voyage.ts that reads VOYAGE_EMBEDDING_MODEL and
VOYAGE_OUTPUT_DIMENSION and assigns to model/dimensions).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f07a2da5-3c07-4911-be18-a7e319e2159e

📥 Commits

Reviewing files that changed from the base of the PR and between 3551241 and 1c2163f.

📒 Files selected for processing (3)
  • src/health/thresholds.ts
  • src/index.ts
  • src/providers/embedding/voyage.ts

Comment on lines +12 to +15
readonly dimensions = parseInt(
getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024",
10,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Validate parsed dimensions to prevent silent NaN.

If VOYAGE_OUTPUT_DIMENSION contains invalid input (e.g., "abc", "1024px"), parseInt returns NaN. This would set provider.dimensions = NaN, causing all dimension checks to fail (v.length !== NaN is always true), breaking embeddings silently.

🛡️ Proposed fix with validation
-  readonly dimensions = parseInt(
-    getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024",
-    10,
-  );
+  readonly dimensions = (() => {
+    const parsed = parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10);
+    if (isNaN(parsed) || parsed <= 0) {
+      throw new Error("VOYAGE_OUTPUT_DIMENSION must be a positive integer");
+    }
+    return parsed;
+  })();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
readonly dimensions = parseInt(
getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024",
10,
);
readonly dimensions = (() => {
const parsed = parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024", 10);
if (isNaN(parsed) || parsed <= 0) {
throw new Error("VOYAGE_OUTPUT_DIMENSION must be a positive integer");
}
return parsed;
})();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/voyage.ts` around lines 12 - 15, The parsed
dimensions field using parseInt(getEnvVar("VOYAGE_OUTPUT_DIMENSION") || "1024",
10) can be NaN for invalid env values; update the initialization of the readonly
dimensions property in the same module to validate the result (Number.isInteger
and >0) and either fall back to a safe default (1024) or throw a clear error;
reference the getEnvVar call, the VOYAGE_OUTPUT_DIMENSION env var, and the
dimensions property so you locate and replace the parseInt usage with parsing +
validation logic and ensure downstream code relying on provider.dimensions never
receives NaN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant