Introduce a mechanism to guard against unbounded cardinality #970
Replies: 3 comments 3 replies
-
A direct use-case for this kind of functionality would be this Kubernetes PR: kubernetes/kubernetes#104484. This PR adds a metric about probe duration which is very valuable to detect probe timeouts. However, to identify a probe, the pod name, and its namespace is needed which means that for this metric to be useful it would need at least these two dimensions that are unbounded and user-controlled. From a security point of view, that is a big concern, but from an observability one, we really want this information that we could then use to create alerts. So far, in Kubernetes we've taken the stance that we shouldn't introduce any metrics that as unbounded cardinality and that prevents a lot of observability improvements. That is even stated in our developer guidelines: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md#dimensionality--cardinality. With a protection against cardinality explosions from the client side we would be able to think less about security issues and focus more on making actual observability improvements. |
Beta Was this translation helpful? Give feedback.
-
@dgrisonnet Thanks for bringing this and the proposal. I like to idea.
As you already brought up, this rather a huge con. To prevent any major behaviour change if we enable this by an opt-in mechanism, I would be happy to proceed. In the documentation we can provide all the necessary tread-offs and user could choose whether they would like to use it or not. |
Beta Was this translation helpful? Give feedback.
-
I like the idea! Having more ways of protecting against overloading a Prometheus deployment make sense to me.
What is unclear to me yet is how an exporter like this can expose these limitation to Prometheus and its users. If an exporter imposes limitations like this some operators no longer give accurate results (e.g. avg()) for example. |
Beta Was this translation helpful? Give feedback.
-
Summary
Cardinality explosion is a reoccurring security concern when working with Prometheus metrics and although over time a lot of functionalities have been added to the Prometheus server to prevent them from causing major incidents, no improvements were made to the client to control cardinality explosions. As such when writing applications intentionally exposing metrics with unbounded cardinality, it is impossible to provide some guarantees to the users and consumers of the said metrics.
Motivations
In a big open-source project such as Kubernetes which has a big community and where we are continuously trying to improve monitoring, contributors often want to introduce metrics that make sense from a monitoring perspective but have an unbounded cardinality. This kind of change is very complex to approve for maintainers due to the security concerns that cardinality explosions involve even though these metrics would be really useful. If there was a functionality in the Prometheus client that would allow bounding the dimensions of these metrics, it would allow maintainers to merge these improvements with the guarantees that even if the Prometheus backends of the users aren't set up to guard against cardinality explosion, the metrics exposed by Kubernetes would not explode anyway.
Goals
Non goals
Proposal
Today, the only way to bound the cardinality of a metric is to go through each one of its labels and identify the unbounded dimensions. Once that's done, the developer needs to allow list the possible values that the label is allowed to take in order to fix the cardinality of the metrics and prevent it from exploding. While this is simple for labels such as the HTTP status code, it is impossible to do so if the label is for example a Kubernetes resource such as a pod or a namespace since they are user-defined. At the same time, for a lot of Kubernetes-related metrics, it is hard to get rid of these labels since they allow to identify the application that the metric is referring to.
A hive of such unbounded cardinality metrics is kube-state-metrics, and although there are mechanisms there to disable metrics, I don't think it is reasonable for users to remove all the information they have on their pods because malicious users could exploit them to cause cardinality explosions.
My proposition to improve that from the client-side would be to focus on the very source of the cardinality explosions, the unbounded dimensions. By introducing hard limits on values that a single label can take, we could bound dimensions that were previously unbounded. The idea would be that if a label takes more values than the hard limit that was defined, then the client would blank it to avoid exposing exploding metrics and then inform users/cluster administrators that a cardinality explosion was prevented at the cost of some granularity on a metric. Informing users that action needs to be taken can be easily node via a combo of metrics and alerts and then it will be up to them to either increase the cardinality limit or find the cause of the explosion and get rid of it.
Pros:
Cons:
Design Details
My first idea about a potential design was focused on the UX that a maintainer would have when enforcing these hard limits. The easiest way would be to have them at the Registry level. In Kubernetes we have thousands of metrics, but only a few Registries, so it would make it way easier to enforce these kinds of restrictions in the few Registries that are responsible for gathering metrics. As such, I was thinking about either adding a new limit field to the
Registry
type of client_golang or creating a separated Registry that would add the cardinality protection functionality.As for enforcing the limit on the number of values that a dimension can take, this can be done at the end of the
Gather
call since that's when all the series are transformed and where we can really know the cardinality of the metrics that will be served. The issue is that this change seems very expensive in terms of performance since we have to go through all the timeseries a second time so it might not be the best option.The approach at the Registry level seemed very convenient to my use-case but that would mean that all the metrics in a Registry share the same limit which is reasonable to me, but some use-case might want to define limits per metrics?
I started working on a POC, but haven't finished yet and after discussing with @bwplotka we figure out that it might be better to first start a discussion here to gather some feedback first.
Beta Was this translation helpful? Give feedback.
All reactions