Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to prevent startup if tag fetching fails #2886

Open
stevenmatthewt opened this issue Jul 12, 2024 · 2 comments
Open

Option to prevent startup if tag fetching fails #2886

stevenmatthewt opened this issue Jul 12, 2024 · 2 comments

Comments

@stevenmatthewt
Copy link

Is your feature request related to a problem? Please describe.
We use --tags-from-ec2-tags to set several tags automatically based on the EC2 instance the agent is running on. One of the tags we set in this way is the queue tag. Recently, one of our agents failed to fetch those EC2 tags on startup (I don't have logs for what the underlying error was unfortunately), but the agent continues to start regardless of this error. This caused our agent to start up in the default queue, which was incorrect and caused us a bit of a headache.

Describe the solution you'd like
Currently, the agent is configured to just log errors when fetching tags. Since that functionality is actually pretty critical to us, I'd love to have an option to actually block startup by erroring/panicking instead.

Blocking startup would be a good safe default, but it's also a breaking change, so just having another config option to enable the "strict" behavior here would work well for us.

Describe alternatives you've considered
Alternatively, if there was a way to configure buildkite-agent to require a queue to be configured, and not to automatically use the default queue, that would work. We tried configuring tags="queue=not-functional-queue" in the config file, with the hopes that it would be "overridden" by the EC2 tag. But it seems that will just cause the agent to listen on both queues.

Or, if there was a way to inspect the config of the Buildkite Agent after it starts, we could use that to verify that the queue was set properly.

@patrobinson
Copy link
Contributor

Hi @stevenmatthewt , thanks for the details. It seems we are using the API to get the tags and retrying 5 times, so this seems like it would only fail if Amazon's API was having a bad time.

It looks like we could instead use the metadata endpoint to retrieve tags, which should be a lot more reliable and wouldn't be a breaking change.

@stevenmatthewt
Copy link
Author

Yeah, I think using the metadata API would be a pretty reasonable change to make internally. I'd honestly still love a way to make failures block startup, as I think the existing behavior is a little backwards and confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants