Skip to content

Conversation

@ericallam
Copy link
Member

@ericallam ericallam commented Nov 21, 2025

This PR fixes #2639, by no longer eagerly deleting "old" worker builds. This means that worker builds will persist to the filesystem now until the dev CLI is exited but this is only way to make sure that runs don't get stuck (the CLI fundamentally can't know when it's safe to delete worker builds).

This PR also fixes another issue that was occurring because we always immediately deleted the very first worker build when the first change was made. This would result in runs executing on the first version the dev CLI built failing, most of them with an ENOENT error because they were trying to execute or import a file that no longer existed.

@changeset-bot
Copy link

changeset-bot bot commented Nov 21, 2025

🦋 Changeset detected

Latest commit: bf8e277

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 24 packages
Name Type
trigger.dev Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-streams Patch
@trigger.dev/build Patch
@trigger.dev/core Patch
@trigger.dev/python Patch
@trigger.dev/react-hooks Patch
@trigger.dev/redis-worker Patch
@trigger.dev/rsc Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/zod-worker Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

This pull request addresses two dev run issues. A changelog entry documents fixes for: (1) dev runs getting stuck in DEQUEUED status, and (2) preventing ENOENT "System failure" errors in dev runs after the first change. The fixes involve removing esbuild output directory cleanup from the dev session rebuild flow, and refactoring dev supervisor worker management by eliminating client-side active-old-worker filtering logic, increasing worker deletion delay from 1 to 5 seconds, and removing worker deletion checks that previously prevented deletion when in-progress runs existed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Worker lifecycle logic changes: The removal of getActiveOldWorkers and workerHasInProgressRuns methods and the shift away from client-side filtering requires understanding the rationale for these changes and their server-side implications.
  • Deletion behavior modification: The change in deleteWorker to only deprecate workers without terminating them represents a semantic shift that affects worker lifecycle management and needs careful validation.
  • Cross-file coordination: Changes span both devSession.ts and devSupervisor.ts, requiring understanding how these components interact and why the directory cleanup removal correlates with the other fixes.
  • Outdir cleanup removal: Understanding why removing the esbuild output directory cleanup resolves the ENOENT error requires context about the original cleanup logic's side effects.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request has no description provided by the author, missing all required template sections including Testing, Changelog, and the checklist. Add a complete pull request description following the template, including Testing steps, Changelog summary, and the contributor checklist.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely summarizes the main fix: preventing dev runs from getting stuck in dequeued status, with explicit reference to the issue number.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/various-dev-cli-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/cli-v3/src/dev/devSupervisor.ts (1)

600-613: Worker cleanup is incomplete – potential resource leak in long-running dev sessions.

The method now only deprecates the version in the process pool but never removes workers from this.workers map or calls worker.stop(). Over an extended dev session with many code changes, the map and associated resources could accumulate indefinitely.

Consider adding cleanup logic to remove workers from the map after deprecation and a grace period. For example:

  #deleteWorker(friendlyId: string) {
    logger.debug("[DevSupervisor] Delete worker (if relevant)", {
      workerId: friendlyId,
    });

    const worker = this.workers.get(friendlyId);
    if (!worker) {
      return;
    }

    if (worker.serverWorker?.version) {
      this.taskRunProcessPool?.deprecateVersion(worker.serverWorker?.version);
    }
+
+   // Remove from map to prevent accumulation
+   this.workers.delete(friendlyId);
+
+   // Optionally stop the worker's background process
+   worker.stop();
  }

Alternatively, document why workers are intentionally left in the map (e.g., if they're needed for debugging or if cleanup happens elsewhere).

🧹 Nitpick comments (1)
packages/cli-v3/src/dev/devSupervisor.ts (1)

595-598: The 5-second timeout is arbitrary and may not prevent all race conditions.

While increasing from 1s to 5s gives more breathing room, it's still a fixed delay that doesn't account for actual run state. Long-running tasks could still be active when deprecation occurs.

Consider checking if the worker has active runs before deprecating, or using a signal-based approach rather than a fixed timeout. For example:

  async #tryDeleteWorker(friendlyId: string) {
    await awaitTimeout(5_000);
+   // Only deprecate if no active runs remain
+   const worker = this.workers.get(friendlyId);
+   if (worker && this.#workerHasActiveRuns(worker)) {
+     return;
+   }
    this.#deleteWorker(friendlyId);
  }

Note: This would require re-introducing logic to check for active runs, which was removed in this PR.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abd99aa and bf8e277.

📒 Files selected for processing (3)
  • .changeset/young-ligers-suffer.md (1 hunks)
  • packages/cli-v3/src/dev/devSession.ts (0 hunks)
  • packages/cli-v3/src/dev/devSupervisor.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • packages/cli-v3/src/dev/devSession.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-08T11:48:12.327Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

  • .changeset/young-ligers-suffer.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
.changeset/young-ligers-suffer.md (1)

1-6: Changelog entry is clear and accurately documents the fixes.

The changeset follows the correct format with appropriate frontmatter, and both fix descriptions are user-facing, clear, and align with the related code changes (esbuild cleanup removal and dev supervisor worker management refactoring).

packages/cli-v3/src/dev/devSupervisor.ts (1)

269-272: Server-side dequeue handler confirmed to ignore oldWorkers—change verified as correct.

The server-side route handler at apps/webapp/app/routes/engine.v1.dev.dequeue.ts explicitly states: "Even though we don't use it, we need to keep it for backwards compatibility." The request body schema is retained only for backward compatibility, but the parameters are never actually consumed by the handler. Passing an empty array for oldWorkers is safe and introduces no risk of regressions.

@ericallam ericallam merged commit 5e5c97e into main Nov 21, 2025
31 checks passed
@ericallam ericallam deleted the fix/various-dev-cli-fixes branch November 21, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: 90% of dev tasks go straight to dequeued state and never execute

3 participants