Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(slos): BoolGauge promql expr should not be multiplied by 100 #1255

Merged
merged 2 commits into from
Oct 13, 2024

Conversation

alexberry
Copy link
Contributor

@alexberry alexberry commented Aug 28, 2024

While building out an uptime SLO using a BoolGauge we performed some simple testing by forcing a failure and observed the following issues:

  • The promql expression linked from the errors graph starts with 100 *, which returns a true percentage
  • The errors graph expects a percentage represented as a decimal fraction (i.e. a float between 0 & 1), and are shared amongst all slos
  • The burn rates also depend upon this interpretation

This resulted in the error graph making sense for our Ratio SLOs, however for a BoolGauge SLOs would return 10000% failure in the error graph (as opposed to the expected 100%), and non-sensical burn rates.

This fixes #1257

@alexberry alexberry changed the title fix(slos): booleanGauge promql expr should not be multiplied by 100 fix(slos): BoolGauge promql expr should not be multiplied by 100 Aug 28, 2024
@deanb-everc
Copy link

would also appreciate for this PR to be merged, seems like we are facing a similar issue

@deanb-everc
Copy link

any news here? we want to start using pyrra mainly with bool gauges, and currently it's unusable due to the error described here

@alexberry
Copy link
Contributor Author

alexberry commented Sep 2, 2024

any news here? we want to start using pyrra mainly with bool gauges, and currently it's unusable due to the error described here

Hey @deanb-everc, I can give you some guidance to build your own image if you'd like, this is how we're working around the issue in our production envs. There is likely a much better way to build an image locally, but this is what worked for me.

Clone my fork & checkout the branch with the fix

 git clone https://github.com/alexberry/pyrra.git
 git switch fix-boolean-gauge-metric

Set up Build Environment

Setting up your build env is documented in CONTRIBUTING.md, so please refer to that document, however the highlights are as follows:

Required build tools

Install these in whichever way your local OS supports it, I am on MacOS so I would use brew install make npm go and then would install docker desktop:

  • make
  • npm
  • go
  • docker

Required build dependencies

npm install react-scripts
make install

Build

Building the binary

export GOOS=linux GOARCH=amd64
make all

After performing this step, you should have a new file in the root of the repo named pyrra - this is the pyrra executable

Building the container image

mkdir -p dist/pyrra_linux_amd64_v1
cp pyrra dist/pyrra_linux_amd64_v1/.
docker build . -t pyrra:dev --build-arg TARGETOS=linux --build-arg TARGETARCH=amd64

This will have now built a local image pyrra:dev. You will now need to push this to a self-hosted registry that your kubernetes cluster has access to pull from. Let's say your registry's address is my-lovely-registry.somecompany.sometld, you can push this custom image up to your registry with:

docker tag pyrra:dev my-lovely-registry.somecompany.sometld/pyrra:dev
docker push my-lovely-registry.somecompany.sometld/pyrra:dev

Configuring Helm to use your custom image

Finally, if you are using the helm chart by rlex to deploy pyrra to your cluster, you can add the following to your values.yaml to switch to your custom image:

image:
  repository: my-lovely-registry.somecompany.sometld/pyrra
  tag: dev

Enjoy :)

@deanb-everc
Copy link

got it @alexberry , thanks!

@alexberry
Copy link
Contributor Author

alexberry commented Sep 3, 2024

@deanb-everc I wanted a better build mechanism than locally-built images, so I created my own (throwaway) PR to generate the container image ghcr.io/alexberry/pyrra:pr-3. You can configure helm to use it with:

image:
  repository: ghcr.io/alexberry/pyrra
  tag: pr-3

@alexberry
Copy link
Contributor Author

hello @metalmatze, is there any chance this can be reviewed? We've been hoping to revert to the master branch of this repo for over a month now, but are still having to run a custom-built image.

@fbegyn
Copy link

fbegyn commented Oct 7, 2024

Looked this over and have confirmed the behavior on some of our instances. LGTM and hoping this can get merged in.

@metalmatze
Copy link
Member

Hello everyone!
Thank you for the great discussion and first of all for sending the PR to begin with.
I was super busy organising PromCon the last 6-8 weeks.

This seems logical to me.
We should just make sure that there's nothing else depending on the 100 *, other than that this looks good to me.

@metalmatze metalmatze merged commit ee4821d into pyrra-dev:main Oct 13, 2024
9 of 10 checks passed
@metalmatze
Copy link
Member

Again, thank you so much!
Given this is merged into main. Do we want to cherry pick it into release-0.7 so we can get you a patch release?

@alexberry
Copy link
Contributor Author

Again, thank you so much! Given this is merged into main. Do we want to cherry pick it into release-0.7 so we can get you a patch release?

Yes please, that would be very helpful :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

boolean_gauge metric type has broken error graph
4 participants