You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When GitHub selects runners, it needs all labels provided by the job to match the labels on the runner.
When this construct provisions runners, it needs all labels on the provider to match the labels on the job.
This 'inversion' can result in issues when making changes to the labels.
Example
A provider is configured with the following labels: ['codebuild']
A repository exists with a job that requests the following labels: ['self-hosted', 'codebuild']
So far, so good
Now, we want to start introducing ARM64 runners, so we create a new provider, and a new label-dimension.
We now have two providers, configured with ['codebuild', 'x86-64'] and ['codebuild', 'arm64']
The repository still exists requesting the following labels: ['self-hosted', 'codebuild']
The orchestrator no longer knows how to route requests for those labels, so it falls into 'Unknown label'.
No runners are provisioned, jobs start queuing up
The repository attempts to resolve this by adding the 'x86-64' label to their job.
PRs that add the label start correctly provisioning runners, but they get picked up by the previously queued jobs, and the PR jobs start queuing.
This feels related to #335, in that resolving that issue would cause different issues for the scenario above. In theory, after making the change to add the new label-dimension, the orchestrator would receive requests for ['self-hosted', 'codebuild'] and would provision runners at random between x86-64 and arm64 runners. This is not likely to be desired behaviour, but is arguably more correct from the perspective of the job saying "I don't care about architecture" than refusing to provision a runner at all.
Another potential resolution could be to document a procedure on how to safely add new labels to an existing suite of providers. Perhaps the Composite Provider mentioned in #335 would work here too
The text was updated successfully, but these errors were encountered:
The generic infrastructure solution should work here too. Don't immediately delete the old provider. Leave 3 providers running until all jobs have been migrated to the two new ones. Instead of replacing ['codebuild'] with ['codebuild', 'x86-64'] and ['codebuild', 'arm64'], keep all three. Once all jobs have moved on to either of the new providers, then you can delete the old provider.
This feels related to #335, in that resolving that issue would cause different issues for the scenario above. In theory, after making the change to add the new label-dimension, the orchestrator would receive requests for ['self-hosted', 'codebuild'] and would provision runners at random between x86-64 and arm64 runners. This is not likely to be desired behaviour, but is arguably more correct from the perspective of the job saying "I don't care about architecture" than refusing to provision a runner at all.
Refusing to provision a runner at all is the desired behavior. We have users with multiple runner setups in multiple accounts pointing to the same GitHub organization. See #181 (comment), #133, and #72 (comment). And as you said, random behavior is not very desired (for example #181).
Documenting how to change a label, and probably some other common processes, is a good idea. Not sure where I would put it yet.
When GitHub selects runners, it needs all labels provided by the job to match the labels on the runner.
When this construct provisions runners, it needs all labels on the provider to match the labels on the job.
This 'inversion' can result in issues when making changes to the labels.
Example
['codebuild']
['self-hosted', 'codebuild']
Now, we want to start introducing ARM64 runners, so we create a new provider, and a new label-dimension.
['codebuild', 'x86-64']
and['codebuild', 'arm64']
['self-hosted', 'codebuild']
This feels related to #335, in that resolving that issue would cause different issues for the scenario above. In theory, after making the change to add the new label-dimension, the orchestrator would receive requests for
['self-hosted', 'codebuild']
and would provision runners at random between x86-64 and arm64 runners. This is not likely to be desired behaviour, but is arguably more correct from the perspective of the job saying "I don't care about architecture" than refusing to provision a runner at all.Another potential resolution could be to document a procedure on how to safely add new labels to an existing suite of providers. Perhaps the Composite Provider mentioned in #335 would work here too
The text was updated successfully, but these errors were encountered: