Skip to content

fix(cli): honor --force on unresponsive engine; detect listening PIDs on Windows#624

Open
yanniedog wants to merge 1 commit into
rohitg00:mainfrom
yanniedog:fix/stop-force-windows-port-detection
Open

fix(cli): honor --force on unresponsive engine; detect listening PIDs on Windows#624
yanniedog wants to merge 1 commit into
rohitg00:mainfrom
yanniedog:fix/stop-force-windows-port-detection

Conversation

@yanniedog
Copy link
Copy Markdown

@yanniedog yanniedog commented May 24, 2026

Summary

Two related bugs make agentmemory stop --force useless on Windows when the iii-engine listening socket is wedged after sleep/wake. Together they leave the user with no CLI path to recover: the engine listens but does not respond, --force is silently dropped, and the only Windows code path for finding port owners returns an empty array.

This PR makes --force actually force, and adds first-class Windows support for finding the listening PID via netstat -ano.

Bugs

  1. runStop checks --force only after deciding the engine is reachable. When the engine listens but is unresponsive, runStop takes the early-exit branch and prints Preserving ~/.agentmemory/iii.pid. Investigate before manual cleanup, ignoring --force entirely. The exact flag that should rescue this scenario does nothing.
  2. findEnginePidsByPort short-circuits with return [] on Windows, so --force has no PIDs to signal even if it were honored. The only candidate left is the pidfile, which is typically stale.

The combined effect on Windows: a hung engine cannot be stopped from the CLI at all. The user must run netstat -ano | findstr :3111 then taskkill /F /PID <pid> by hand.

Repro (Windows 11, Node 22, agentmemory 0.9.21)

PS> agentmemory          # start engine
# ...let laptop sleep for a few minutes, then resume...
PS> agentmemory status
x  Not running - no response at http://127.0.0.1:3111
PS> agentmemory stop --force
!  Engine not responding on :3111, but 1 process(es) still hold the port or pidfile: 39672
o  Preserving ~/.agentmemory/iii.pid. Investigate before manual cleanup
# exit 1 - --force did nothing
PS> netstat -ano | findstr :3111
  TCP    0.0.0.0:3111   0.0.0.0:0   LISTENING   39672
PS> taskkill /F /PID 39672   # the only working escape hatch

After this PR:

PS> agentmemory stop --force
!  --force: signaling 1 unresponsive process(es) on :3111: 39672
o  Stopped pid 39672
o  Stopped. Memories persisted to disk; restart anytime with: npx @agentmemory/agentmemory

Changes

  • runStop: when !running and survivors exist, honor --force by SIGTERM/SIGKILL-ing every PID via the existing signalAndWait helper, then clear pidfile + engine state so the next start is not refused by the stale-state guard. The non---force message now points at the --force flag in addition to manual triage commands.
  • findEnginePidsByPort: on Windows, parse netstat -ano -p TCP for PIDs whose local address ends in :PORT and whose foreign address is the unspecified peer (0.0.0.0:0, [::]:0, *:*). The state column is ignored because it is locale-translated (de-DE renders LISTENING as ABHOEREN, fr-FR as ECOUTER, etc) - keying off the peer address is locale-stable.
  • parseNetstatListeningPids: extracted as a pure exported function so the parser is unit-testable without spawning netstat.
  • Help text for stop --force now mentions netstat and the unresponsive-engine case.

Test plan

New file test/cli-stop-port-detection.test.ts (Vitest) covers:

  • IPv4 (0.0.0.0:3111) + IPv6 ([::]:3111) listeners on the requested port
  • Filtering ESTABLISHED / CLOSE_WAIT / TIME_WAIT rows (the browser-tab ghost socket case - a stale viewer SSE connection in CLOSE_WAIT must not be reported as a listening owner, otherwise --force would kill an unrelated browser process)
  • selfPid exclusion (the CLI must not signal its own parent, exit 137)
  • Port-prefix collisions (:3111 vs :31111 vs :13111)
  • Dedup across IPv4/IPv6 dual-stack listeners
  • Empty / garbage / UDP-only input
  • Locale-translated state words (ABHOEREN on de-DE)

Local run:

$ npx vitest run test/cli-stop-port-detection.test.ts
 Test Files  1 passed (1)
      Tests  7 passed (7)

Full suite: my branch reduces overall failures from 24 to 17 on a Windows 11 host (the remaining 17 are pre-existing path-related failures unaffected by this change; same set fails on main). Build passes (npm run build clean).

Non-goals (deliberate)

  • No change to the macOS/Linux lsof path - it already works.
  • No change to the Docker code path - the Docker guard is unchanged.
  • No new dependencies. netstat.exe ships with every Windows install.

Notes

I hit this while writing a Windows supervisor for agentmemory (yanniedog/agentmemory-keeper). The supervisor works around the bug by killing port holders directly via PowerShell, but the right fix is to make agentmemory stop --force do what its name promises. Happy to scope down further if you want this split into two PRs (one per bug).

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved detection of engine processes running on Windows
    • Enhanced agentmemory stop --force to handle unresponsive engines more effectively
    • Added detailed warnings with specific process IDs when engine fails to respond gracefully
  • Documentation

    • Updated help text for --force option to clarify behavior
  • Tests

    • Added comprehensive test suite for process detection functionality

Review Change Stack

… on Windows

Two related bugs surface when an iii-engine listening socket is wedged
after sleep/wake on Windows:

1. `agentmemory stop --force` checks --force only after deciding the
   engine is reachable. When the engine is unresponsive but its socket
   still listens, runStop took the early-exit path and printed
   "Preserving ~/.agentmemory/iii.pid. Investigate before manual
   cleanup", ignoring --force entirely. The user is told the exact
   flag for this situation does nothing.

2. findEnginePidsByPort short-circuits with `return []` on Windows, so
   --force has no PIDs to signal even if it were honored. The only
   survivor list comes from the pidfile, which itself is often stale.

The combination meant a Windows user with a hung engine could not stop
it from the CLI at all - manual netstat + taskkill was the only way.

Changes:

- runStop: when !running and survivors exist, honor --force by
  SIGTERM/SIGKILL-ing every PID and clearing pidfile + engine state so
  the next start is not refused by a stale-state guard. Without --force
  the message is updated to point at the flag rather than only at
  manual triage.

- findEnginePidsByPort: on Windows, parse `netstat -ano -p TCP` for
  PIDs whose local address ends in :PORT and whose foreign address is
  the unspecified peer (LISTENING rows). The state column is ignored
  because it is locale-translated ("ABHOEREN" on de-DE, etc).

- parseNetstatListeningPids: extracted as a pure exported helper so the
  parser can be unit-tested without spawning netstat.

- Help text for `stop --force` now mentions netstat and the
  unresponsive-engine case.

Test plan: test/cli-stop-port-detection.test.ts covers IPv4 + IPv6
listeners, CLOSE_WAIT / ESTABLISHED filtering (the browser ghost-tab
case), selfPid exclusion, port-prefix collisions, dedup, blank input,
UDP input, and locale-translated state words.

Signed-off-by: Jonathan Koka <yanniedog@users.noreply.github.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

@yanniedog is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9ee45f7-566c-4264-8dbb-0af973c82163

📥 Commits

Reviewing files that changed from the base of the PR and between 3551241 and 53ef1b7.

📒 Files selected for processing (2)
  • src/cli.ts
  • test/cli-stop-port-detection.test.ts

📝 Walkthrough

Walkthrough

This PR improves the agentmemory stop command for Windows and unresponsive engines. It exports a pure netstat parser for TCP listening PID detection, uses it to replace Windows's previous stub behavior, and reworks the non-responsive stop path to either warn (default) or force-kill (with --force) lingering engine processes.

Changes

Windows PID Detection and Forced Stop

Layer / File(s) Summary
Windows netstat listening PID parser
src/cli.ts
Introduces parseNetstatListeningPids() to parse netstat -ano output and extract PIDs for TCP listening sockets on a given port, tolerating localized state text and excluding a caller PID. Integrates it into findEnginePidsByPort() to enable Windows engine discovery instead of returning empty.
Forced stop behavior and help text
src/cli.ts
Updates agentmemory stop --force help text and reworks the non-responsive stop path: without --force, preserves the pidfile and warns with surviving PID details; with --force, sends SIGTERM/SIGKILL to all surviving PIDs, clears state, and reports whether any survived.
Parser unit test suite
test/cli-stop-port-detection.test.ts
Comprehensive Vitest coverage for parseNetstatListeningPids() across IPv4/IPv6, non-listening state filtering, selfPid exclusion, exact port matching, PID deduplication, empty/garbage input, and locale-translated listening states via peer heuristics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A parser born on Windows' shore,
To find the PIDs standing tall,
With --force, no more stale lore—
Just SIGTERM, SIGKILL, and that's all!
The engine stops when gently called,
Or forcefully, when we demand it fall. 🛑

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately captures the two main changes: fixing --force behavior on unresponsive engines and adding Windows PID detection via netstat parsing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant