Skip to content

Replace pm2 with homegrown process manager daemon#2712

Open
fredrikekelund wants to merge 22 commits intotrunkfrom
stu-1349-replace-pm2-with-homegrown
Open

Replace pm2 with homegrown process manager daemon#2712
fredrikekelund wants to merge 22 commits intotrunkfrom
stu-1349-replace-pm2-with-homegrown

Conversation

@fredrikekelund
Copy link
Contributor

@fredrikekelund fredrikekelund commented Mar 5, 2026

Related issues

How AI was used in this PR

Codex wrote most of the code in this PR, starting from a detailed list of requirements I shared (based on our experience with pm2). I iterated on this over multiple days, both with Codex and by hand. I've touched all major files in the PR. process-manager-ipc.ts is a standout example – I made many manual edits there. I've reviewed all the code. The code I've spent the least time editing or reviewing is the tests.

Proposed Changes

Tip

I'm happy to pair for a review of this PR. I know it's a big one. Also, please note that this PR builds upon the changes in #2690. That PR should be reviewed first.

See STU-1349 for the "why" of removing pm2. This PR implements the following:

  • A process-manager-daemon.ts that manages child processes. This involves forking them, tracking them in-memory, forwarding child IPC messages as typed daemon events, and writing stdout/stderr to PM2-compatible log files under ~/.studio/pm2/logs. This file is the PM2 replacement.
  • daemon-client.ts replaces pm2-manager.ts. It talks to the daemon over a control socket for request/response commands and an events socket for receiving messages from child processes and process events (online/exit, etc.).

The architecture largely follows PM2's example, and process-manager-daemon.ts implements the same functionality. A few notes to get reviewers started:

  • Starting/stopping processes are relatively easy to understand from the code.
  • IPC is a little trickier, but the gist is that daemon-client exports a sendMessageToProcess function that uses the process manager "control socket" to send a message with a processId target. When process-manager-daemon.ts receives the message, it uses the standard Node.js IPC mechanism to forward it to the child process. Child processes cannot respond directly to these messages, but they can emit events through Node.js standard IPC, which process-manager-daemon.ts picks up and broadcasts on the "events socket". daemonBus listens on the events socket, and the subscribeSiteEvents function picks up the process-message and process-event events, which are ultimately picked up by _events.ts.

Testing Instructions

  1. Run npm start
  2. Ensure that starting and stopping sites works as expected
  3. Try creating a new site, too
  4. While Studio is still running, run node apps/cli/dist/cli/main.js site start --skip-browser --path PATH_TO_SITE in your terminal
  5. Ensure that the site is immediately marked as started in Studio

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?

@fredrikekelund fredrikekelund requested review from a team and bcotrim March 5, 2026 15:32
@fredrikekelund fredrikekelund self-assigned this Mar 5, 2026
Base automatically changed from stu-1301-replace-pm2-axon-with-homegrown to trunk March 9, 2026 07:27
@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Mar 9, 2026

📊 Performance Test Results

Comparing b9212c6 vs trunk

site-editor

Metric trunk b9212c6 Diff Change
load 1455.00 ms 1481.00 ms +26.00 ms ⚪ 0.0%

site-startup

Metric trunk b9212c6 Diff Change
siteCreation 7076.00 ms 6071.00 ms -1005.00 ms 🟢 -14.2%
siteStartup 3934.00 ms 3936.00 ms +2.00 ms ⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

Copy link
Contributor

@bcotrim bcotrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing this I found some weird behaviors:

  1. Killing a site process (kill pid) didn't reflect on Studio. Site was still running in the UI and stop operation didn't work (CLI worked fine). I could only replicate this once
  2. Doing Stop all from the Studio UI button, seems to break the UI. After doing stop all starting any other sites would not be reflected in the UI. I was able to replicate this consistently.

Added some minor comments to the code that aren't blockers, but we should look into these 2 issues.

} from 'cli/lib/types/wordpress-server-ipc';

// Zod schema for process descriptions
export const processDescriptionSchema = z.object( {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be more intentional about the status + pid options?
I believe an online process will always have a pid, and the optional scenario is only for stopped processes. We could use an union to be more intentional about this type, what do you think?

const socket = await this.connect();
return this.sendAndReadFromSocket( socket, payload );
} );
this.queue = responsePromise.then(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is odd, I wonder if there's a better approach to it.

import path from 'path';

export const PROCESS_MANAGER_HOME =
process.env.STUDIO_PROCESS_MANAGER_HOME ?? path.join( os.homedir(), '.studio', 'pm2' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you decided keeping the pm2 name for backwards compatibility reasons, but I wonder if we should just rename it and handle existing folder content in a migration script or similar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants