-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating "Best practices: Naming lables" #2469 #2549
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Gopi-eng2202 <[email protected]>
Signed-off-by: Gopi-eng2202 <[email protected]>
content/docs/practices/naming.md
Outdated
* If the metric name has a unit, then the unit must be the last word before the accumulating count. Examples, | ||
* <code>process\_cpu\_<b>seconds</b>\_<b>total</b></code> | ||
* <code>request\_size\_<b>bytes</b>\_<b>total</b></code> | ||
* If the metric name does'nt have a real unit, then the naming can follow any sorting order in a way that it fits your use case, such as `grouping similar metric names`. Examples, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with this section. The example connections
is a real countable unit.
However, we have agreed in the past that unit modifiers are prefered to be before, but are still allowed to be after. For example, both process_thing_failed_bytes_total
and process_thing_bytes_failed_total
are both valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, i can try to rephrase that.
But with regards to your overall comment, I understood that only standalone units which are measurable, like "seconds", "bytes" are considered to be a real unit. In a different context "connections" could be a unit, but it does'nt represent a unit of measurement though.
Correct me if i am wrong, ill read the discussion for this issue again to see if i can rephrase these lines.
Thanks for the feedback, will get back with this issue soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the previous discussions such that we had reached the consensus that we won't consider things like connections
a "real" unit for the sake of metric naming. (I'm not sure, though, if we should use the wording "real unit" in the doc here.) If my understanding is flawed, maybe @juliusv and @SuperQ could go back to the drawing board and find out what the consensus is.
Very generally, this document is just a recommendation anyway, so we aren't defining a spec here and maybe don't have to be super precise.
On the other hand, OpenMetrics in its current state specifies about the UNIT metadata field:
Unit specifies MetricFamily units. If non-empty, it MUST be a suffix of the MetricFamily name separated by an underscore. Be aware that further generation rules might make it an infix in the text format.
I'm not sure if we are ever going to enforce this in Prometheus (and the strict requirement might even be lifted in OM v2), but since the OM standard was evolved from existing best practices, it gives us some hints. I would believe the following to be true:
- A "real" unit is one that would show up in the UNIT metadata field. (E.g. we would probably not have a line like
# UNIT go_sql_stats_connections_idle connections
in the OM exposition.) - Such a "real" unit really should be the last thing before any other suffix like
_total
(in OM, it's a MUST even).
So maybe that's what should go in here: Somehow explain what "real" units are (using a better wording than just saying "real unit"), and then have a very strong recommendation to put them last.
Co-authored-by: Ben Kochie <[email protected]> Signed-off-by: gopi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To take a step back:
The changes as proposed right now are in part trying to recommend even more strictly that the unit has to go last. But that's already in the guidelines, even without a change. Remember that the whole discussion started because people felt the rule should be expressed in a less strict way, because they saw "units" like "connections" not being last in widely used metric names. The discussion then turned more the other way by concluding that things like "connections" are not "real" units.
To summarize:
I think we need far fewer changes here. What I want to see is a good rule-of-thumb what a unit is in the narrow sense (maybe referring to the "base units" section further down, which essentially lists many relevant examples, notably excluding things like "connections") and then we don't really have to change much else. Maybe we could add a new paragraph about ordering the metric name components in a way that the lexicographic sorting makes most sense, and that then you should still put the "real" units last, but say that this doesn't apply to things like "connections".
content/docs/practices/naming.md
Outdated
@@ -39,6 +39,26 @@ A metric name... | |||
(for a pseudo-metric that provides [metadata](https://www.robustperception.io/exposing-the-software-version-to-prometheus) about the running binary) | |||
* <code>data\_pipeline\_last\_record\_processed\_<b>timestamp_seconds</b></code> | |||
(for a timestamp that tracks the time of the latest record processed in a data processing pipeline) | |||
* ...that has a `unit` in its name, should have that `unit` as the last word. Note that an accumulating count such as `total` will be the last word, in addition to the unit if applicable. Examples, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This partially repeats the paragraph above. Maybe we can just work the new information into the previous paragraph? (Note that the previous paragraph gets the indentation wrong. Maybe something we can fix now. I.e. all the example metric name should be indented to be in a sub-list.)
content/docs/practices/naming.md
Outdated
* If the metric name has a unit, then the unit must be the last word before the accumulating count. Examples, | ||
* <code>process\_cpu\_<b>seconds</b>\_<b>total</b></code> | ||
* <code>request\_size\_<b>bytes</b>\_<b>total</b></code> | ||
* If the metric name does'nt have a real unit, then the naming can follow any sorting order in a way that it fits your use case, such as `grouping similar metric names`. Examples, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: does'nt → doesn't
content/docs/practices/naming.md
Outdated
* <code>net\_conntrack\_dialer\_conn\_total</code> | ||
* <code>net\_conntrack\_dialer\_conn\_total\_failed</code> | ||
* <code>net\_conntrack\_dialer\_conn\_total\_established</code> | ||
* <code>net\_conntrack\_dialer\_conn\_total\_closed</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that we always want the _total
last in a counter. So these examples should not be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll work on this
content/docs/practices/naming.md
Outdated
or | ||
* <code>net\_conntrack\_dialer\_conn\_total</code> | ||
* <code>net\_conntrack\_dialer\_conn\_total\_failed</code> | ||
* <code>net\_conntrack\_dialer\_conn\_total\_established</code> | ||
* <code>net\_conntrack\_dialer\_conn\_total\_closed</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this example is invalid.
or | |
* <code>net\_conntrack\_dialer\_conn\_total</code> | |
* <code>net\_conntrack\_dialer\_conn\_total\_failed</code> | |
* <code>net\_conntrack\_dialer\_conn\_total\_established</code> | |
* <code>net\_conntrack\_dialer\_conn\_total\_closed</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
Signed-off-by: Gopi-eng2202 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -27,7 +27,7 @@ A metric name... | |||
* <code><b>http</b>\_request\_duration\_seconds</code> | |||
(for all HTTP requests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 23 to 28 need one more space indentation to render as intended. (This problem existed before, but now we are adding more examples and should fix it finally.)
@@ -39,6 +39,16 @@ A metric name... | |||
(for a pseudo-metric that provides [metadata](https://www.robustperception.io/exposing-the-software-version-to-prometheus) about the running binary) | |||
* <code>data\_pipeline\_last\_record\_processed\_<b>timestamp_seconds</b></code> | |||
(for a timestamp that tracks the time of the latest record processed in a data processing pipeline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 32 to 41 need one more space indentation to render as intended. (This problem existed before, but now we are adding more examples and should fix it finally.)
@@ -27,7 +27,7 @@ A metric name... | |||
* <code><b>http</b>\_request\_duration\_seconds</code> | |||
(for all HTTP requests) | |||
* ...must have a single unit (i.e. do not mix seconds with milliseconds, or seconds with bytes). | |||
* ...should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers). See below for a list of base units. | |||
* ...should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers). Note that things like connections, notifications are not considered as base units.See below for a list of base units. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding this note here is confusing, because this sentence is about using seconds
and not milliseconds
rather than defining what a unit is. Furthermore, even if a metric used milliseconds, the unit would still go last. I recommend to leave this line unchanged.
* ...that has a accumulating count such as `total`, but doesn't have a base unit in the name like, "<code>prometheus\_tsdb\_head\_truncations</code>", then it can have the following naming scenarios. Note that the former examples would be suitable if you want to follow lexographic sorting order for the names. | ||
* <code>prometheus\_tsdb\_head\_truncations\_total</code> | ||
* <code>prometheus\_tsdb\_head\_truncations\_failed\_total</code> | ||
* <code>prometheus\_tsdb\_head\_truncations\_established\_total</code> | ||
* <code>prometheus\_tsdb\_head\_truncations\_closed\_total</code> | ||
or | ||
* <code>prometheus\_tsdb\_head\_truncations\_total</code> | ||
* <code>prometheus\_tsdb\_head\_failed\_truncations\_total</code> | ||
* <code>prometheus\_tsdb\_head\_established\_truncations\_total</code> | ||
* <code>prometheus\_tsdb\_head\_closed\_truncations\_total</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would word this a bit differently. Also, the indentation of the lists has to be different to work properly. Also, the example metrics should actually be sorted lexicographically to prove the point. See suggestion below.
* ...that has a accumulating count such as `total`, but doesn't have a base unit in the name like, "<code>prometheus\_tsdb\_head\_truncations</code>", then it can have the following naming scenarios. Note that the former examples would be suitable if you want to follow lexographic sorting order for the names. | |
* <code>prometheus\_tsdb\_head\_truncations\_total</code> | |
* <code>prometheus\_tsdb\_head\_truncations\_failed\_total</code> | |
* <code>prometheus\_tsdb\_head\_truncations\_established\_total</code> | |
* <code>prometheus\_tsdb\_head\_truncations\_closed\_total</code> | |
or | |
* <code>prometheus\_tsdb\_head\_truncations\_total</code> | |
* <code>prometheus\_tsdb\_head\_failed\_truncations\_total</code> | |
* <code>prometheus\_tsdb\_head\_established\_truncations\_total</code> | |
* <code>prometheus\_tsdb\_head\_closed\_truncations\_total</code> | |
* ...may order its name components in a way that leads to convenient grouping when a list of metric names is sorted lexicographically, as long as all the other rules are followed. The following examples have their the common name components first so that all the related metrics are sorted together: | |
* <code>prometheus\_tsdb\_head\_truncations\_closed\_total</code> | |
* <code>prometheus\_tsdb\_head\_truncations\_established\_total</code> | |
* <code>prometheus\_tsdb\_head\_truncations\_failed\_total</code> | |
* <code>prometheus\_tsdb\_head\_truncations\_total</code> | |
The following examples are also valid, but are following a different trade-off. They are easier to read individually, but unrelated metrics like <code>prometheus\_tsdb\_head\_series</code> might get sorted in between. | |
* <code>prometheus\_tsdb\_head\_closed\_truncations\_total</code> | |
* <code>prometheus\_tsdb\_head\_established\_truncations\_total</code> | |
* <code>prometheus\_tsdb\_head\_failed\_truncations\_total</code> | |
* <code>prometheus\_tsdb\_head\_truncations\_total</code> |
@@ -27,7 +27,7 @@ A metric name... | |||
* <code><b>http</b>\_request\_duration\_seconds</code> | |||
(for all HTTP requests) | |||
* ...must have a single unit (i.e. do not mix seconds with milliseconds, or seconds with bytes). | |||
* ...should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers). See below for a list of base units. | |||
* ...should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers). Note that things like connections, notifications are not considered as base units.See below for a list of base units. | |||
* ...should have a suffix describing the unit, in plural form. Note that an accumulating count has `total` as a suffix, in addition to the unit if applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the note about "real" units can happen. You could append something like:
Also note that this applies to units in the narrow sense (like the units in the table below), but not to countable things in general. For example, connections
or notifications
are not considered units for this rule and do not have to be at the end of the metric name. (See also examples in the next paragraph.)
* <code>prometheus\_tsdb\_head\_truncations\_total</code> | ||
* <code>prometheus\_tsdb\_head\_failed\_truncations\_total</code> | ||
* <code>prometheus\_tsdb\_head\_established\_truncations\_total</code> | ||
* <code>prometheus\_tsdb\_head\_closed\_truncations\_total</code> | ||
* ...should represent the same logical thing-being-measured across all label | ||
dimensions. | ||
* request duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 54 to 56 need to be indented by one more blank.
Hi @SuperQ and @juliusv,
cc @beorn7 ,
This pr is about the issue #8718.
I have updated the documentation on Best practices: Metric naming, based on the discussion from the mentioned issue#. I request you to have a look at it and let me know what you think.
Signed-off-by: Gopi-eng2202 [email protected]