Skip to content

Commit

Permalink
Remove worker events (created, evicted, destroyed) and calculate the …
Browse files Browse the repository at this point in the history
…worker_pool_stats from the final list of WorkerProcessMetrics. This will log all WorkerPoolStats for all worker pools (even though nothing may have happened) and let us infer what happened from the counts. Also add unknown_destroyed_count and alive_count to the WorkerPoolStats proto (the latter describes the current status of the pool).

Some additional points:
1. Worker events are only used to generate the `worker_pool_stats` in the BEP.
2. Since we already keep killed worker metrics (`WorkerProcessMetrics`) around during the build, we don't need this mechanism in (1) to tell us what happened to each worker and just infer directly from the worker metrics.
3. This allows us to have a consistency between the `worker_pool_stats` and worker_metrics in the BEP in terms of the counts. See (4) and (5).
4. Counting worker events can be inaccurate, a worker process can be forcefully killed (`WorkerLifecycleManager#killLargeWorkers`) in one build, and only counted in the next build when the spawn runner realizes that the worker is invalid because the process has already terminated, only then posting a `WorkerDestroyedEvent`.
5. If no workers are killed or created during a build, then the `worker_pool_stats` is empty, which is likely confusing. It would be clearer log all worker pools (even though nothing happened) and let the counts in the proto explicitly tell us what has happened.

RELNOTES: Log all WorkerPoolStats for all worker pools (even though workers aren't created or destroyed). Also add unknown_destroyed_count and alive_count to the WorkerPoolStats proto.
PiperOrigin-RevId: 588787331
  • Loading branch information
Googler authored and copybara-github committed Dec 7, 2023
1 parent 3e35774 commit e399f82
Showing 1 changed file with 7 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -1220,7 +1220,9 @@ message BuildMetrics {
string mnemonic = 2;
// Number of workers created during a build.
int64 created_count = 3;
// Number of workers destroyed during a build.
// Number of workers destroyed during a build (sum of all workers
// destroyed by eviction, UserExecException, IoException,
// InterruptedException and unknown reasons below).
int64 destroyed_count = 4;
// Number of workers evicted during a build.
int64 evicted_count = 5;
Expand All @@ -1230,6 +1232,10 @@ message BuildMetrics {
int64 io_exception_destroyed_count = 7;
// Number of workers destroyed due to InterruptedExceptions.
int64 interrupted_exception_destroyed_count = 8;
// Number of workers destroyed due to an unknown reason.
int64 unknown_destroyed_count = 9;
// Number of workers alive at the end of the build.
int64 alive_count = 10;
}
}

Expand Down

0 comments on commit e399f82

Please sign in to comment.