-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the situation where metric collection stops when NPE occours #710
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yaolong Liu <[email protected]>
Thanks a lot for the PR. My understanding is that the root cause for the Exception is a bug in how The question is: What is the best way to deal with bugs like these? The current implementation is making the scrape fail. Your PR would change this to silently ignoring the invalid metric and making the scrape successful for other metrics. Both approaches have their drawbacks: Dealing with partial metrics is difficult, especially when it's silently missing, and having no metrics at all because of a bug in What do you think about implementing the public List<MetricFamilySamples> collect(Predicate<String> sampleNameFilter); Then users could explicitly skip metrics by configuring that in the exporter ( |
Signed-off-by: Yaolong Liu <[email protected]>
Thanks Reply, My understanding is that, we can implement the In addition, I think the null pointer exception is a hidden danger of many systems, and it is not limited to Looking forward for your response, thx! |
The default implementation first collects all metrics, and then applies the filter. So if the This can be achieved with an explicit implementation of DropwizardExports.collect(Predicate<String> sampleNameFilter); The goal is to prevent filtered metrics from being collected in the first place. This will require a bit of refactoring, because you need to pass the filter down to where the sample name is created and apply the filter. It's more work than just catching and ignoring the |
I think I probably understood your idea, but there may be a difference. We can't rewrite |
I was thinking you add a method like this to @Override
public List<MetricFamilySamples> collect(Predicate<String> nameFilter) {
// TODO: Collect only metrics where nameFilter.test(name) is true
} With such a method, |
Sorry for the late reply,
The user can call this method like this
In this way, the corresponding metrics can be ignored. |
I don't think it's a good idea to have an "implicitly ignore all erroneous metrics" option. It would be better to ignore them explicitly. If the |
So, do you think we can throw exception information directly in the |
@codings-dan I feel a combination of @fstab 's proposals would be ideal.
Note: Using a Possible/example code (Not tested)
|
@dhoard Thanks for reviewing the code and making suggestions, I'll try to fix this again based on your suggestion |
We use
Prometheus
to monitorAlluxio
system metrics. When some metrics are empty for various reasons, the system no longer monitors the metrics. We can catch this exception, so as not to affect the collection and monitoring of other metrics.