Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value too high #3791

Closed
wants to merge 2 commits into from
Closed

Conversation

xogoodnow
Copy link
Contributor

Hi,

The default values for components are set too high.
The default suggestion could be confusing for users and also prevents the chart to be deployed with default values.

Cheers

@xogoodnow xogoodnow requested a review from a team as a code owner December 18, 2024 09:31
@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Dec 31, 2024

Hi @xogoodnow,

We cannot deliberately change the default values, as this would constitute a breaking change and would affect many users. For example, imagine an ingester pod that previously utilized 4GB of memory (which is totally expected) being scheduled on a node with only 2GB available. The same applies to the OOM killer: even if the node had enough resources, the pod would be killed eventually.

I find most of the currently set parameters to be safe and well-considered and appropriate for general use. I would decrease the requests.memory for the statefulsets (set to8Gi by default), but I don't believe we can implement the change proposed at this point.

Instead, we could add a note to our documentation with an override example. Alternatively, or in addition, we could provide a separate values file specifically tailored for small deployments. What do you think?

@xogoodnow
Copy link
Contributor Author

Hi @xogoodnow,

We cannot deliberately change the default values, as this would constitute a breaking change and would affect many users. For example, imagine an ingester pod that previously utilized 4GB of memory (which is totally expected) being scheduled on a node with only 2GB available. The same applies to the OOM killer: even if the node had enough resources, the pod would be killed eventually.

I find most of the currently set parameters to be safe and well-considered and appropriate for general use. I would decrease the requests.memory for the statefulsets (set to8Gi by default), but I don't believe we can implement the change proposed at this point.

Instead, we could add a note to our documentation with an override example. Alternatively, or in addition, we could provide a separate values file specifically tailored for small deployments. What do you think?

Hi @kolesnikovae,

  1. I do not think this would constitute as a breaking change since folks don't just apply the chart (maybe on labs, but definitely not on prod or even staging environments)
  2. IMHO, the default values must be mere suggestions and if needed, a minimum required resources for the service to work properly.The current values are restricting and disoriented. (we have 1core cpu and 8Gi for one service and 256Mi and 0.1core for another). (for example the ingester is way too juiced up and query-front-end is simply not)

This is just my suggestion (as I have not seen any other chart like this).

Cheers

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Jan 2, 2025

Hi @xogoodnow! Thank you for the input!

I do not think this would constitute as a breaking change since folks don't just apply the chart (maybe on labs, but definitely not on prod or even staging environments)

Oh, I wouldn't bet on it :) there's no way to estimate the blast radius, and we can't even make educated guesses

IMHO, the default values must be mere suggestions and if needed, a minimum required resources for the service to work properly.The current values are restricting and disoriented. (we have 1core cpu and 8Gi for one service and 256Mi and 0.1core for another). (for example the ingester is way too juiced up and query-front-end is simply not)

I find most of the currently set parameters appropriate for general use. Each service has its own role and the resource consumption differs drastically depending on the role. For example:

  • ingesters handle the write path and produce data blocks, which requires a substantial amount of memory and CPU.
  • query-frontend acts as a bridge between the outer world and the storage, and it typically consumes very few resources.

Speaking of similar charts – for example, in Mimir we have presets for small and large deployments: https://github.com/grafana/mimir/tree/main/operations/helm/charts/mimir-distributed. And this is what I'm suggesting

@xogoodnow
Copy link
Contributor Author

Hi @xogoodnow! Thank you for the input!

I do not think this would constitute as a breaking change since folks don't just apply the chart (maybe on labs, but definitely not on prod or even staging environments)

Oh, I wouldn't bet on it :) there’s no way to estimate the blast radius, and we can't even make educated guesses

IMHO, the default values must be mere suggestions and if needed, a minimum required resources for the service to work properly.The current values are restricting and disoriented. (we have 1core cpu and 8Gi for one service and 256Mi and 0.1core for another). (for example the ingester is way too juiced up and query-front-end is simply not)

I find most of the currently set parameters appropriate for general use. Each service has its own role and the resource consumption differs drastically depending on the role. For example:

  • ingesters handle the write path and produce data blocks, which requires a substantial amount of memory and CPU.
  • query-frontend acts as a bridge between the outer world and the storage, and it typically consumes very few resources.

Speaking of similar charts – for example, in Mimir we have presets for small and large deployments: https://github.com/grafana/mimir/tree/main/operations/helm/charts/mimir-distributed. And this is what I'm suggesting

Hi @kolesnikovae,

I sill think my points were valid but I respect your perspective on this.
Just to make it clear, should we break up the values.yaml and add 2 files for small and large deployments instead ?

Cheers

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Jan 2, 2025

Hi @xogoodnow,

Thank you. I'd add values-micro-services-small.yaml with minimal requests (like 100m and 256Mi), and reasonable memory limits (e.g., 4Gi), but no CPU limits.

In the quick start guide, we can replace values-micro-services.yaml with the values-micro-services-small.yaml and clarify that this is for demonstration purposes only. We should also note that the values should be adjusted for a production run, and the values-micro-services.yaml and values-micro-services-hpa.yaml files may serve as a reference.

If you're going to make this change, please also update helm/check in the Makefile (similarly to values-micro-services.yaml).

Please note that values.yaml is for all-in-one deployments and has no resources defined.

@xogoodnow
Copy link
Contributor Author

Hi @xogoodnow,

Thank you. I'd add values-micro-services-small.yaml with minimal requests (like 100m and 256Mi), and reasonable memory limits (e.g., 4Gi), but no CPU limits.

In the quick start guide, we can replace values-micro-services.yaml with the values-micro-services-small.yaml and clarify that this is for demonstration purposes only. We should also note that the values should be adjusted for a production run, and the values-micro-services.yaml and values-micro-services-hpa.yaml files may serve as a reference.

If you're going to make this change, please also update helm/check in the Makefile (similarly to values-micro-services.yaml).

Please note that values.yaml is for all-in-one deployments and has no resources defined.

@kolesnikovae,
Thank you.
If you are ok with it, I would be glad to.
Cheers

@xogoodnow
Copy link
Contributor Author

Hi @kolesnikovae,

Since the point of this PR is not valid anymore, I would close it and discuss the aforementioned changes on another PR.

Cheers

@xogoodnow xogoodnow closed this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants