fix(windows): make agentmemory actually start on Windows (watch path + IPv4 engine WS + configurable timeout)#634
Conversation
… 60s The wrapper's hardcoded 15-second readiness window is too tight on Windows. iii spawns the `iii-exec` worker (`node dist/index.mjs`), which has to cold-start Node, import native deps, reconnect to the engine over WebSocket, and register HTTP triggers before the REST surface answers. On Windows that handshake regularly takes longer than 15s, so `agentmemory` exits with "REST API never responded" and orphans iii — leaving the user with a port-bound engine that has no agentmemory routes registered. This change: - Introduces AGENTMEMORY_READY_TIMEOUT_MS (default 60000ms). - Replaces both `waitForEngine(15000)` call sites with the constant. - Templates the timeout into the "did not become ready within Ns" error so the message matches whatever the user configured. - Documents the new env var in --help. No behavior change for users whose engine starts in <15s; Windows installs now succeed out of the box. Users on slow disks / CI can raise the timeout further; users wanting fast-fail smoke tests can lower it.
|
@lloyd1515 is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughAdds a configurable engine readiness timeout via AGENTMEMORY_READY_TIMEOUT_MS (default 60s), applies it to CLI startup and doctor flows, updates worker watch globs to ChangesEngine Readiness Timeout Configuration & Related Runtime Updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/cli.ts (1)
85-89: 💤 Low valueConsider aligning with the established timeout parsing pattern.
The timeout parsing uses
Number.parseInt(raw, 10), which works correctly. However, context snippet 1 fromsrc/mcp/rest-proxy.tsshows the established pattern in this codebase:const n = Number(raw); return Number.isFinite(n) && n > 0 ? Math.floor(n) : DEFAULT_VALUE;For consistency with the existing
probeTimeoutMs()pattern, consider:♻️ Align with established pattern
const READY_TIMEOUT_MS = (() => { const raw = process.env["AGENTMEMORY_READY_TIMEOUT_MS"]; if (!raw) return 60000; - const parsed = Number.parseInt(raw, 10); - return Number.isFinite(parsed) && parsed > 0 ? parsed : 60000; + const n = Number(raw); + return Number.isFinite(n) && n > 0 ? Math.floor(n) : 60000; })();As per coding guidelines, context snippet 1 demonstrates the repo's established pattern for env-provided timeout parsing/validation (Number(raw), finite, >0, otherwise default), which matches the numeric validation + fallback behavior but uses
Number(raw)andMath.floor(n)for consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli.ts` around lines 85 - 89, The READY_TIMEOUT_MS initializer should follow the repo's established timeout parsing pattern: read process.env["AGENTMEMORY_READY_TIMEOUT_MS"] into raw, convert with Number(raw) (not Number.parseInt), validate with Number.isFinite(n) && n > 0, and return Math.floor(n) when valid or the default 60000 otherwise; update the READY_TIMEOUT_MS logic accordingly (referencing READY_TIMEOUT_MS and AGENTMEMORY_READY_TIMEOUT_MS) to match probeTimeoutMs()-style parsing and fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/cli.ts`:
- Around line 85-89: The READY_TIMEOUT_MS initializer should follow the repo's
established timeout parsing pattern: read
process.env["AGENTMEMORY_READY_TIMEOUT_MS"] into raw, convert with Number(raw)
(not Number.parseInt), validate with Number.isFinite(n) && n > 0, and return
Math.floor(n) when valid or the default 60000 otherwise; update the
READY_TIMEOUT_MS logic accordingly (referencing READY_TIMEOUT_MS and
AGENTMEMORY_READY_TIMEOUT_MS) to match probeTimeoutMs()-style parsing and
fallback behavior.
… child boots
Two latent bugs that combine to make `agentmemory` unusable on Windows
(and silently fragile on Linux):
1. iii-config*.yaml's `iii-exec` worker watches `src/**/*.ts`. The npm
package only ships `dist/`, so that glob resolves to nothing for end
users. On some platforms an empty watch glob causes iii-exec to
abort startup before it ever spawns `node dist/index.mjs`, which is
the process that actually registers the REST HTTP triggers. Switch
to `dist/**/*.mjs` so the watcher sees the shipped artifact.
2. `loadConfig()` defaults `engineUrl` to `ws://localhost:49134`. On
Windows, `localhost` prefers IPv6 (`::1`), but iii-engine binds
IPv4 only — so the node child enters an ECONNREFUSED reconnect
loop and never finishes registering its HTTP triggers, even when
iii-exec did spawn it. Default to `ws://127.0.0.1:49134` to force
IPv4. Existing `III_ENGINE_URL` overrides are unchanged.
Verified locally on Windows 11 / Node 22: with both fixes plus the
timeout bump from the previous commit, `node dist/cli.mjs` brings the
server to `{"status":"healthy"}` in ~1.3s. Before this commit the
engine never registered any routes — every REST/MCP call hit a 404
backend even though `:3111` was bound.
Problem
On Windows,
agentmemory(v0.9.21) cannot complete startup. The wrapper prints "The engine process started but the REST API never responded" and exits, leaving an orphanediii.exelistening on:3111with zero agentmemory routes registered. Every subsequent REST/MCP call gets a 404 from the dead backend.Root causes (two latent bugs, one papered-over symptom)
1.
iii-execwatches a path that doesn't exist in shipped packagesiii-config.yamlandiii-config.docker.yamlconfigure theiii-execworker to watchsrc/**/*.ts. The published npm package ships onlydist/— that glob resolves to zero files for every end user. On some platforms (Windows reproducibly), an empty watch glob causesiii-execto abort startup before it ever spawnsnode dist/index.mjs, which is the process that actually registers the REST HTTP triggers.Fix: watch
dist/**/*.mjsso the watcher targets the shipped artifact.2. Engine WS default resolves to IPv6 on Windows, but iii binds IPv4 only
src/config.ts:140defaultsengineUrltows://localhost:49134. On Windows,localhostprefers::1(IPv6), but iii-engine binds IPv4 only — so the node child enters anECONNREFUSEDreconnect loop indefinitely and never finishes registering its HTTP triggers, even wheniii-execdid manage to spawn it.Fix: default to
ws://127.0.0.1:49134.III_ENGINE_URLoverrides are unchanged.3. (Defensive) Hardcoded 15 s readiness timeout
waitForEngine(15000)atsrc/cli.ts:1040and:1387is too tight: even after bugs 1 & 2 are fixed, cold-start of the Node child + native deps + WebSocket handshake + trigger registration can push past 15 s on slow disks / CI / first-run installs. IntroduceAGENTMEMORY_READY_TIMEOUT_MS(default 60 000) and templatize the "did not become ready within Ns" message.Verification
Tested on Windows 11 / Node 22.12 / iii v0.11.2:
node dist/cli.mjs→ "REST API never responded" after 15 s, every time. Confirmed via direct probe (curl http://127.0.0.1:3111/agentmemory/health→ 404).node dist/cli.mjs→@agentmemory/mcp) connects and tool calls succeed end-to-end against Codex CLI / Gemini CLI / Antigravity CLI / Claude Code.Other checks:
npm run build✓npx tsc --noEmit✓ (no errors)npm test→ 1081 / 1096 passing. The 15 pre-existing failures are Windows-specific path-comparison issues inobsidian-export.test.tsetc.; none touch any file in this PR.node dist/cli.mjs --helpshows the new env var underEnvironment:.Files changed
src/cli.ts— addAGENTMEMORY_READY_TIMEOUT_MSconstant, replace bothwaitForEngine(15000)call sites, templatize the timeout message, document the env var in--help.src/config.ts— defaultengineUrltows://127.0.0.1:49134instead ofws://localhost:....iii-config.yaml,iii-config.docker.yaml— watchdist/**/*.mjsinstead ofsrc/**/*.ts.Why the (2) and (3) splits are worth keeping separate
(2) and (3) compound: even on a machine where IPv6 happens to fall through to IPv4 quickly, the hardcoded 15 s ceiling sometimes wasn't enough. Fixing only the timeout would have masked the real bugs (and Docker mode would still have been broken because
AGENTMEMORY_USE_DOCKER=1doesn't help wheniii-execitself can't find files to watch). Fixing only the watch path would have left Windows users in the reconnect loop. All three are needed for a clean Windows install.Credit
Root cause of (1) and (2) identified in collaboration with another agent run by the same user, who patched these the same way in their local
node_modulesto get a working server before this PR was opened.