-
Notifications
You must be signed in to change notification settings - Fork 680
feat(control-plane): [issue 4746] pool sufficiency metrics #4747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(control-plane): [issue 4746] pool sufficiency metrics #4747
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds optional pool sufficiency metrics to track whether runner pools are adequately sized to handle incoming jobs. The feature allows users to monitor if their pool configurations are right-sized by recording metrics when jobs arrive.
- Adds a new
enable_pool_sufficiency
configuration option across all relevant modules - Implements metric collection logic in the scale-up function to track pool adequacy
- Includes comprehensive tests to verify the metric behavior under different scenarios
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
variables.tf | Adds enable_pool_sufficiency option to root module metrics configuration |
modules/runners/variables.tf | Adds enable_pool_sufficiency option to runners module metrics configuration |
modules/runners/scale-up.tf | Passes pool sufficiency metric environment variable to scale-up Lambda |
modules/runners/job-retry/variables.tf | Adds enable_pool_sufficiency option to job-retry module configuration |
modules/runners/job-retry/main.tf | Passes pool sufficiency metric environment variable to job-retry Lambda |
modules/multi-runner/variables.tf | Adds enable_pool_sufficiency option to multi-runner module metrics configuration |
lambdas/functions/control-plane/src/scale-runners/scale-up.ts | Implements pool sufficiency metric collection logic |
lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts | Adds comprehensive tests for pool sufficiency metric functionality |
examples/multi-runner/main.tf | Documents the new enable_pool_sufficiency option in example |
examples/default/main.tf | Documents the new enable_pool_sufficiency option in example |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
function createPoolSufficiencyMetric(environment: string, payload: ActionRequestMessage, wasSufficient: boolean) { | ||
if (yn(process.env.ENABLE_METRIC_POOL_SUFFICIENCY, { default: false })) { | ||
const metric = createSingleMetric('SufficientPoolHosts', MetricUnit.Count, wasSufficient ? 1.0 : 0.0, { | ||
Environment: environment, | ||
}); | ||
metric.addMetadata('Environment', environment); | ||
metric.addMetadata('RepositoryName', payload.repositoryName); | ||
metric.addMetadata('RepositoryOwner', payload.repositoryOwner); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Environment dimension is added twice - once in the metric creation and again as metadata. The dimension in the metric creation (line 487) should be sufficient for grouping metrics, making the duplicate metadata on line 489 redundant.
Copilot uses AI. Check for mistakes.
return { | ||
...actual, | ||
// eslint-disable-next-line @typescript-eslint/no-unused-vars | ||
createSingleMetric: vi.fn((name: string, unit: string, value: number, dimensions?: Record<string, string>) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock function parameter types don't match the actual createSingleMetric
function signature. The second parameter should be MetricUnit
enum type, not string
. This could lead to test inconsistencies.
Copilot uses AI. Check for mistakes.
@mabbott-aurorasolar thx for our PR. I have the next day not much time to dig in. Are you considering the PR ready? If not you can mark it as draft. I see no documentation, at least this part of the docs needs to be adjusted: https://github-aws-runners.github.io/terraform-aws-github-runner/configuration/#multiple-runner-module-in-your-aws-account |
@npalm great callout on the docs. I'll update those as well, and follow-up on copilot's suggestions. I honestly threw this together in an hour this morning and wasn't sure what the usual process on this repo is, but yeah I do think this is mostly feature complete. |
Sorry for keep you wainting. Will do my best to put the PR on the list for the next week. |
See #4746 for full context.
In short, this PR will add metrics (optionally) to record whether or not a given runner pool was sufficient to handle a job when it arrives. This will allow users (specifically, my company) to better understand whether we're right-sizing our pools or not.