Skip to content

fix(windows): make agentmemory actually start on Windows (watch path + IPv4 engine WS + configurable timeout)#634

Open
lloyd1515 wants to merge 2 commits into
rohitg00:mainfrom
lloyd1515:fix/windows-engine-ready-timeout
Open

fix(windows): make agentmemory actually start on Windows (watch path + IPv4 engine WS + configurable timeout)#634
lloyd1515 wants to merge 2 commits into
rohitg00:mainfrom
lloyd1515:fix/windows-engine-ready-timeout

Conversation

@lloyd1515
Copy link
Copy Markdown

@lloyd1515 lloyd1515 commented May 24, 2026

Problem

On Windows, agentmemory (v0.9.21) cannot complete startup. The wrapper prints "The engine process started but the REST API never responded" and exits, leaving an orphaned iii.exe listening on :3111 with zero agentmemory routes registered. Every subsequent REST/MCP call gets a 404 from the dead backend.

Root causes (two latent bugs, one papered-over symptom)

1. iii-exec watches a path that doesn't exist in shipped packages

iii-config.yaml and iii-config.docker.yaml configure the iii-exec worker to watch src/**/*.ts. The published npm package ships only dist/ — that glob resolves to zero files for every end user. On some platforms (Windows reproducibly), an empty watch glob causes iii-exec to abort startup before it ever spawns node dist/index.mjs, which is the process that actually registers the REST HTTP triggers.

Fix: watch dist/**/*.mjs so the watcher targets the shipped artifact.

2. Engine WS default resolves to IPv6 on Windows, but iii binds IPv4 only

src/config.ts:140 defaults engineUrl to ws://localhost:49134. On Windows, localhost prefers ::1 (IPv6), but iii-engine binds IPv4 only — so the node child enters an ECONNREFUSED reconnect loop indefinitely and never finishes registering its HTTP triggers, even when iii-exec did manage to spawn it.

Fix: default to ws://127.0.0.1:49134. III_ENGINE_URL overrides are unchanged.

3. (Defensive) Hardcoded 15 s readiness timeout

waitForEngine(15000) at src/cli.ts:1040 and :1387 is too tight: even after bugs 1 & 2 are fixed, cold-start of the Node child + native deps + WebSocket handshake + trigger registration can push past 15 s on slow disks / CI / first-run installs. Introduce AGENTMEMORY_READY_TIMEOUT_MS (default 60 000) and templatize the "did not become ready within Ns" message.

Verification

Tested on Windows 11 / Node 22.12 / iii v0.11.2:

  • Before: node dist/cli.mjs → "REST API never responded" after 15 s, every time. Confirmed via direct probe (curl http://127.0.0.1:3111/agentmemory/health → 404).
  • After this PR: node dist/cli.mjs
    curl -s http://127.0.0.1:3111/agentmemory/health
    {"status":"healthy","version":"0.9.21","viewerPort":3113, ...}
    
    Engine reaches healthy state in ~1.3 s. All 124 REST endpoints registered. MCP shim (@agentmemory/mcp) connects and tool calls succeed end-to-end against Codex CLI / Gemini CLI / Antigravity CLI / Claude Code.

Other checks:

  • npm run build
  • npx tsc --noEmit ✓ (no errors)
  • npm test → 1081 / 1096 passing. The 15 pre-existing failures are Windows-specific path-comparison issues in obsidian-export.test.ts etc.; none touch any file in this PR.
  • node dist/cli.mjs --help shows the new env var under Environment:.

Files changed

  • src/cli.ts — add AGENTMEMORY_READY_TIMEOUT_MS constant, replace both waitForEngine(15000) call sites, templatize the timeout message, document the env var in --help.
  • src/config.ts — default engineUrl to ws://127.0.0.1:49134 instead of ws://localhost:....
  • iii-config.yaml, iii-config.docker.yaml — watch dist/**/*.mjs instead of src/**/*.ts.

Why the (2) and (3) splits are worth keeping separate

(2) and (3) compound: even on a machine where IPv6 happens to fall through to IPv4 quickly, the hardcoded 15 s ceiling sometimes wasn't enough. Fixing only the timeout would have masked the real bugs (and Docker mode would still have been broken because AGENTMEMORY_USE_DOCKER=1 doesn't help when iii-exec itself can't find files to watch). Fixing only the watch path would have left Windows users in the reconnect loop. All three are needed for a clean Windows install.

Credit

Root cause of (1) and (2) identified in collaboration with another agent run by the same user, who patched these the same way in their local node_modules to get a working server before this PR was opened.

… 60s

The wrapper's hardcoded 15-second readiness window is too tight on
Windows. iii spawns the `iii-exec` worker (`node dist/index.mjs`),
which has to cold-start Node, import native deps, reconnect to the
engine over WebSocket, and register HTTP triggers before the REST
surface answers. On Windows that handshake regularly takes longer
than 15s, so `agentmemory` exits with "REST API never responded" and
orphans iii — leaving the user with a port-bound engine that has no
agentmemory routes registered.

This change:
- Introduces AGENTMEMORY_READY_TIMEOUT_MS (default 60000ms).
- Replaces both `waitForEngine(15000)` call sites with the constant.
- Templates the timeout into the "did not become ready within Ns"
  error so the message matches whatever the user configured.
- Documents the new env var in --help.

No behavior change for users whose engine starts in <15s; Windows
installs now succeed out of the box. Users on slow disks / CI can
raise the timeout further; users wanting fast-fail smoke tests can
lower it.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

@lloyd1515 is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

Adds a configurable engine readiness timeout via AGENTMEMORY_READY_TIMEOUT_MS (default 60s), applies it to CLI startup and doctor flows, updates worker watch globs to dist/**/*.mjs, and changes the engineUrl default to ws://127.0.0.1:49134.

Changes

Engine Readiness Timeout Configuration & Related Runtime Updates

Layer / File(s) Summary
Timeout constant and help documentation
src/cli.ts
READY_TIMEOUT_MS parses AGENTMEMORY_READY_TIMEOUT_MS (numeric validation, default 60,000ms) and CLI help documents the env var.
Apply timeout in startup and doctor flows
src/cli.ts
Startup readiness waiting and the doctor runStart diagnostic use READY_TIMEOUT_MS instead of a hard-coded timeout; spinner/error messages show the configured duration.
Worker watch globs target built artifacts
iii-config.docker.yaml, iii-config.yaml
Change watcher globs from src/**/*.ts to dist/**/*.mjs and add comments explaining watchers should target shipped dist/ artifacts.
engineUrl IPv4 default fallback
src/config.ts
Default engineUrl fallback changed from ws://localhost:49134 to ws://127.0.0.1:49134 with a comment about IPv4 vs IPv6 localhost behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through env and CLI,
Counted milliseconds, one-two-three,
Sixty seconds now can bend,
Watchers watch the built dist end,
IPv4 guides my tiny spree.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and specifically describes the main changes: fixes for Windows startup issues including configurable timeout, watch path correction, and IPv4 engine WebSocket configuration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/cli.ts (1)

85-89: 💤 Low value

Consider aligning with the established timeout parsing pattern.

The timeout parsing uses Number.parseInt(raw, 10), which works correctly. However, context snippet 1 from src/mcp/rest-proxy.ts shows the established pattern in this codebase:

const n = Number(raw);
return Number.isFinite(n) && n > 0 ? Math.floor(n) : DEFAULT_VALUE;

For consistency with the existing probeTimeoutMs() pattern, consider:

♻️ Align with established pattern
 const READY_TIMEOUT_MS = (() => {
   const raw = process.env["AGENTMEMORY_READY_TIMEOUT_MS"];
   if (!raw) return 60000;
-  const parsed = Number.parseInt(raw, 10);
-  return Number.isFinite(parsed) && parsed > 0 ? parsed : 60000;
+  const n = Number(raw);
+  return Number.isFinite(n) && n > 0 ? Math.floor(n) : 60000;
 })();

As per coding guidelines, context snippet 1 demonstrates the repo's established pattern for env-provided timeout parsing/validation (Number(raw), finite, >0, otherwise default), which matches the numeric validation + fallback behavior but uses Number(raw) and Math.floor(n) for consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli.ts` around lines 85 - 89, The READY_TIMEOUT_MS initializer should
follow the repo's established timeout parsing pattern: read
process.env["AGENTMEMORY_READY_TIMEOUT_MS"] into raw, convert with Number(raw)
(not Number.parseInt), validate with Number.isFinite(n) && n > 0, and return
Math.floor(n) when valid or the default 60000 otherwise; update the
READY_TIMEOUT_MS logic accordingly (referencing READY_TIMEOUT_MS and
AGENTMEMORY_READY_TIMEOUT_MS) to match probeTimeoutMs()-style parsing and
fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/cli.ts`:
- Around line 85-89: The READY_TIMEOUT_MS initializer should follow the repo's
established timeout parsing pattern: read
process.env["AGENTMEMORY_READY_TIMEOUT_MS"] into raw, convert with Number(raw)
(not Number.parseInt), validate with Number.isFinite(n) && n > 0, and return
Math.floor(n) when valid or the default 60000 otherwise; update the
READY_TIMEOUT_MS logic accordingly (referencing READY_TIMEOUT_MS and
AGENTMEMORY_READY_TIMEOUT_MS) to match probeTimeoutMs()-style parsing and
fallback behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 873299c8-512f-4b91-bf69-1acb1c127195

📥 Commits

Reviewing files that changed from the base of the PR and between 3551241 and 13967c5.

📒 Files selected for processing (1)
  • src/cli.ts

… child boots

Two latent bugs that combine to make `agentmemory` unusable on Windows
(and silently fragile on Linux):

1. iii-config*.yaml's `iii-exec` worker watches `src/**/*.ts`. The npm
   package only ships `dist/`, so that glob resolves to nothing for end
   users. On some platforms an empty watch glob causes iii-exec to
   abort startup before it ever spawns `node dist/index.mjs`, which is
   the process that actually registers the REST HTTP triggers. Switch
   to `dist/**/*.mjs` so the watcher sees the shipped artifact.

2. `loadConfig()` defaults `engineUrl` to `ws://localhost:49134`. On
   Windows, `localhost` prefers IPv6 (`::1`), but iii-engine binds
   IPv4 only — so the node child enters an ECONNREFUSED reconnect
   loop and never finishes registering its HTTP triggers, even when
   iii-exec did spawn it. Default to `ws://127.0.0.1:49134` to force
   IPv4. Existing `III_ENGINE_URL` overrides are unchanged.

Verified locally on Windows 11 / Node 22: with both fixes plus the
timeout bump from the previous commit, `node dist/cli.mjs` brings the
server to `{"status":"healthy"}` in ~1.3s. Before this commit the
engine never registered any routes — every REST/MCP call hit a 404
backend even though `:3111` was bound.
@lloyd1515 lloyd1515 changed the title fix(cli): make iii-engine ready timeout configurable, bump default to 60s fix(windows): make agentmemory actually start on Windows (watch path + IPv4 engine WS + configurable timeout) May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant