Hey.
It would be nice if the client offered some (semi-automatic) handling of disappeared labelsets.
What do I mean by that?
Well I guess there are (at least) two classes of labelssets for a given metric:
- If one has a metric
number_of_HTTP_responses with a labelname like status and values for that like 500, 200, 404, then I'd say that these labels never really disappear. At most, they just no longer count up.
- But if one has a metric like
physical_drive_medium_errors with a labelname like drive_name and values for that like foo, bar, baz (something which uniquely identifies the drive in the system), then any value may disappear e.g. when the drive breaks and is replaced.
So in this case, the labelset for the drive that is gone should no longer be exported (with a value that would never change again), but be removed.
Now there are further (at least) two ways of instrumentation:
a. One sets/increases the metrics right at the place where the objects are actually worked with - i.e. scattered throughout the code
b. One has a more or less central place in the code, where the values are gathered and set.
I'd say, often in practise it's actually (b). At least every time, where one cannot really integrate instrumentation natively but merely parses some data and transforms that into metrics.
Even node_exporter would also be (b), I'd say.
How to get rid of disappeared labelsets?
Well there are of course the clear() and remove() methods.
The problem with the former is that it removes all. In (b) one might be tempted to say that this is not a problem, as one could simply do something like (pseudo-code):
while true:
clear_all_metrics()
set_all_metrics()
sleep_some_time()
clear_all_metrics() would simply remove everything, set_all_metrics() would re-set all that are still there.
btw: I guess that approach is also problematic when not just printing the metrics output once to stdout or using write_to_textfile() - because when e.g. some webserver runs that continuously exports the metrics, that might be queried just after clear_all_metrics() but before set_all_metrics() (or before that has finished).
In (a), this wouldn't really work anyway, but even in (b) it's problematic as e.g. the _created timestamp metrics for Counters would also get cleared&re-set every time.
Also, Counter.inc() wouldn't work anymore - at least not directly, because one would somehow need to keep track of the previous value.
With clear() therefore not really useful in practise (IMO), remove() remains:
The problem with remove() is IMO that one typically has no record of what to remove.
Like when the drive from above goes away, an exporter does not get some active indication that's gone - it’s just no longer there in the parsed output.
So effectively, this forces one to keep track of all labelsets per metric that one had in the previous call to set_all_metrics(), compare that which was still left in the current call, and then remove() the ones gone one by one (thereby also keeping things like _created for Counters (that haven’t disappeared).
Is there any recommendation on how to handle that scenario?
Or any means to handle this more out-of-the-box?
If not, what about a framework like this:
- per metric and per labelset,
prometheus_client tracks an additional value (maybe a bool or an int - let's use an int here, but haven't really thought it through whether that brings any benefits).
- when a labelset is created for a metrics with no value yet, the
int is set to 0
- every time a value is set (by
set(), inc(), etc.) for a given labelset+metric, that int is increased by 1
- There's a special
clear_non_updated(), which does what clear() does, but only for those labelsets+metrics, where the int is still 0, i.e. those which haven't been updated.
After it has cleared all these, it re-sets the int to 0 for all remaining ones. Maybe that could be placed in a separate function (not really sure whether there'd be any use case for that).
With that it should be possible to rewrite the main loop from above to something like:
while true:
set_all_metrics()
clear_all_non_updated_metrics() (which calls clear_non_updated() for all metrics)
sleep_some_time()
It might make sense to do the actual implementation a bit different from the above.
Perhaps one can maintain the int that counts whether a labelset+metrics was updated at the registry, and also place the clear_non_updated() at registry level.
The idea is that one could then have different registries, such for labelset+metrics that should live on, even if not updated, and such for labelset+metrics which should then disappear.
Problem with that seems however to be that start_http_server() and similar accept only one registry. So that probably doesn't work.
Another way could be a parameter when the metric object is created, which tells whether cleanup via clear_non_updated() should be performed or not.
Perhaps cleanup_group=somestring, and when clear_non_updated() is called with no such cleanup_group, it would simply clean up all (unless they'd been updated of course), but if one was given, it would clean up only those where the name matches (again, unless they'd been updated of course).
Any ideas on that?
Cheers,
Chris.
Hey.
It would be nice if the client offered some (semi-automatic) handling of disappeared labelsets.
What do I mean by that?
Well I guess there are (at least) two classes of labelssets for a given metric:
number_of_HTTP_responseswith a labelname likestatusand values for that like500,200,404, then I'd say that these labels never really disappear. At most, they just no longer count up.physical_drive_medium_errorswith a labelname likedrive_nameand values for that likefoo,bar,baz(something which uniquely identifies the drive in the system), then any value may disappear e.g. when the drive breaks and is replaced.So in this case, the labelset for the drive that is gone should no longer be exported (with a value that would never change again), but be removed.
Now there are further (at least) two ways of instrumentation:
a. One sets/increases the metrics right at the place where the objects are actually worked with - i.e. scattered throughout the code
b. One has a more or less central place in the code, where the values are gathered and set.
I'd say, often in practise it's actually (b). At least every time, where one cannot really integrate instrumentation natively but merely parses some data and transforms that into metrics.
Even
node_exporterwould also be (b), I'd say.How to get rid of disappeared labelsets?
Well there are of course the
clear()andremove()methods.The problem with the former is that it removes all. In (b) one might be tempted to say that this is not a problem, as one could simply do something like (pseudo-code):
clear_all_metrics()would simply remove everything,set_all_metrics()would re-set all that are still there.btw: I guess that approach is also problematic when not just printing the metrics output once to stdout or using
write_to_textfile()- because when e.g. some webserver runs that continuously exports the metrics, that might be queried just afterclear_all_metrics()but beforeset_all_metrics()(or before that has finished).In (a), this wouldn't really work anyway, but even in (b) it's problematic as e.g. the
_createdtimestamp metrics forCounterswould also get cleared&re-set every time.Also,
Counter.inc()wouldn't work anymore - at least not directly, because one would somehow need to keep track of the previous value.With
clear()therefore not really useful in practise (IMO),remove()remains:The problem with
remove()is IMO that one typically has no record of what to remove.Like when the drive from above goes away, an exporter does not get some active indication that's gone - it’s just no longer there in the parsed output.
So effectively, this forces one to keep track of all labelsets per metric that one had in the previous call to
set_all_metrics(), compare that which was still left in the current call, and thenremove()the ones gone one by one (thereby also keeping things like_createdforCounters(that haven’t disappeared).Is there any recommendation on how to handle that scenario?
Or any means to handle this more out-of-the-box?
If not, what about a framework like this:
prometheus_clienttracks an additional value (maybe aboolor anint- let's use aninthere, but haven't really thought it through whether that brings any benefits).intis set to0set(),inc(), etc.) for a given labelset+metric, thatintis increased by1clear_non_updated(), which does whatclear()does, but only for those labelsets+metrics, where theintis still0, i.e. those which haven't been updated.After it has cleared all these, it re-sets the
intto0for all remaining ones. Maybe that could be placed in a separate function (not really sure whether there'd be any use case for that).With that it should be possible to rewrite the main loop from above to something like:
It might make sense to do the actual implementation a bit different from the above.
Perhaps one can maintain the
intthat counts whether a labelset+metrics was updated at the registry, and also place theclear_non_updated()at registry level.The idea is that one could then have different registries, such for labelset+metrics that should live on, even if not updated, and such for labelset+metrics which should then disappear.
Problem with that seems however to be that
start_http_server()and similar accept only oneregistry. So that probably doesn't work.Another way could be a parameter when the metric object is created, which tells whether cleanup via
clear_non_updated()should be performed or not.Perhaps
cleanup_group=somestring, and whenclear_non_updated()is called with no suchcleanup_group, it would simply clean up all (unless they'd been updated of course), but if one was given, it would clean up only those where the name matches (again, unless they'd been updated of course).Any ideas on that?
Cheers,
Chris.