-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clustering] Filter the label "instance" from the hash computation #6792
Conversation
Converting to draft for now because we think it might be a risky change and we don't want it in 1.0 |
I'm not convinced this is the right solution; feels too arbitrary. One of clustering's main assumptions is that all nodes run with the same configuration files and that the upstream APIs (eg. service discovery) work in the same way so that the fully-local consistent hashing works properly. The prometheus.exporter.* components may set a custom IMHO this could be part of the documentation for running clustering together with exporter components. Nice find nonetheless! |
In this example the agents have the same config files, they just run on different hosts. Do you think that we should constrain the agents to run on the same host if they use clustering + prometheus exporter or should we fix the exporters so that we don't set the instance label to the host where the agent is running? |
Is this correct? According to this doc page and this blog post, the |
In the agent we set it by default to the host of the agent if I understand it correctly here: https://github.com/grafana/agent/blob/main/internal/component/prometheus/exporter/exporter.go#L179 |
I think the bug in the clustering can be fixed if SNMP exporter's |
that would be a start but if we want the same behavior as this PR we need to make sure that all exporters (except the ones self, process, UNIX (maybe others?)) set the instance to the target's host and not to the agent's one |
This PR has not had any activity in the past 30 days, so the |
After taking a closer look, I'm also not in favour of this solution, but I 100% agree we should do something to address this problem. Thinking about this, it seems to me that In case we don't want the To make this less error-prone, we could also check whether we are running in a cluster in exporters components and do something to warn the user? |
@thampiotr If a I think we need to go through all |
Thanks @ptodev for the discussion (offline). I'm going to summarise it here:
With this, I will close this PR for now and add a note to fine-grained scheduling issue. |
PR Description
In clustering mode, the targets exposed by exporters are distributed amongst the agents.
For an agent to know whether it should scrape the target or not, it will compute a hash based on the labels of the target.
We noticed that the agent adds a label "instance" corresponding by default to the host where the agent runs to the targets. This is a problem because if the agents are not running on the same host, they will have a different value for the label "instance". This means that they will compute different hashes and the targets won't be distributed correctly.
Notes to the Reviewer
Is there a scenario where a target can only be uniquely identified by the instance label?
If that's the case, then the agent would compute the same hash for different targets which would result in an unbalanced cluster.
PR Checklist