Skip to content

Move suffix handling from creation time to scrape time #1942

@zeitlinger

Description

@zeitlinger

Part of #1912. Prerequisite for OM2 suffix-related PRs.

TL;DR

Today, metric names are modified at both creation time
(reserved suffix rejection) and scrape time (suffix
appending/stripping). This design moves all suffix handling
to scrape time. Going forward, the SDK does not modify
metric names — only the OM1 legacy format writer appends
suffixes.

Reference: Metric Names: A Guide for Client Library
Maintainers

Dev summit consensus (2024-09-13):

Suffixes like _total or _info are only ever enforced
in OM v1 and some existing SDK implementations. They MUST
NOT be enforced in any other situation.

We intend to get rid of automatically enforced suffixes
with OM 2.0.

Principles

  1. Don't break existing apps. Existing names exposed
    via OM1 (pull) and OTel (push) must not change without
    explicit opt-in.
  2. The SDK does not modify metric names. The OM2 spec's
    SHOULDs about unit suffixes and _total are guidance
    for users naming their metrics, not for the SDK to
    enforce. OM2 and OTel preserve output the name exactly
    as provided.
  3. OM1 is legacy. It continues to append _total, unit
    suffixes, etc. at scrape time. No changes to OM1 output.
  4. OTel legacy behavior behind a flag. The existing OTel
    name stripping (_total, unit suffix) is kept as default
    for backward compat, flipped in v2 (OTel exporter: preserve_names=true as default #1943).
  5. Fail early on collisions. Registry detects name
    collisions across all formats at registration time.

Key table

User provides OM1 OM2 OTel OTel + otel_preserve_names
Counter("events") events_total events events events
Counter("events_total") events_total events_total events events_total
Counter("req_bytes").unit(BYTES) req_bytes_total req_bytes name req, unit By name req_bytes, unit By
Counter("req").unit(BYTES) req_bytes_total req name req, unit By name req, unit By
Gauge("connections") connections connections connections connections
Gauge("events_total") events_total events_total events_total events_total
Histogram("dur_seconds").unit(SECONDS) dur_seconds_* dur_seconds name dur, unit s name dur_seconds, unit s
Histogram("dur").unit(SECONDS) dur_seconds_* dur name dur, unit s name dur, unit s
  • Rows 1+2 cannot coexist (OM1 collision detected at
    registration time)
  • OM1 (legacy): appends _total, unit suffix — no
    changes
  • OM2: name as user wrote it. No flag needed — the SDK
    does not modify names
  • OTel: legacy path strips _total + unit suffix;
    otel_preserve_names stops stripping
  • OTel preserve = OM2 = name as provided

Collision detection

Registry computes names a metric would produce across all
formats. Rejects if any collide with existing metrics.

Strictly better than today's blanket rejection — only
genuine collisions are caught.

Responsibility matrix

Component Today After
Counter.Builder strips _total no-op
Info.Builder strips _info no-op
PrometheusNaming rejects/strips reserved suffixes no reserved suffixes
PrometheusRegistry rejects duplicate names + cross-format collisions
OM1 writer always appends _total smart-appends (skip if present)
OM2 writer always appends _total no-op (name as provided)
OTel exporter strips unit only legacy: + _total strip; preserve: no-op
Protobuf writer always appends _total smart-appends (skip if present)

otel_preserve_names flag

Controls OTel export behavior:

  • false (default today): legacy behavior — strip
    _total from Counters, strip unit suffix from name, set
    OTel unit metadata.
  • true: name exactly as the user wrote it. Unit
    metadata set from .unit() if provided.

Default becomes true in next major release (#1943).

No equivalent flag for OM2 — the OM2 writer always outputs
the name as provided. The OM2 spec's SHOULDs about unit
suffixes and _total are naming guidance for users, not
behavior the SDK enforces.

disableSuffixAppending not needed

Suffix behavior is per-format, not a standalone flag. OM2
config (#1939) should focus on contentNegotiation,
compositeValues, exemplarCompliance,
nativeHistograms. Suffix handling follows the format
automatically.

OM2 spec references

From the OM2 spec:

  • Unit: "If non-empty, it SHOULD be a suffix of the
    MetricFamily name separated by an underscore."
  • Counter: "The MetricFamily name for Counters SHOULD end
    in _total."
  • Info: "The MetricFamily name for Info metrics MUST end
    in _info."

These SHOULDs are guidance for users naming their metrics.
The SDK's job is to pass names through, not enforce naming
conventions. MUST (_info) is enforced by the OM2 writer.

v2 defaults

In the next major release:

  • otel_preserve_names: default true. OTel users get
    names exactly as they wrote them. Users who depend on
    _total/unit stripping set false explicitly.

v2 key table

User provides OM1 OM2 OTel (v2)
Counter("events") events_total events events
Counter("events_total") events_total events_total events_total
Counter("req").unit(BYTES) req_bytes_total req name req, unit By
Histogram("dur").unit(SECONDS) dur_seconds_* dur name dur, unit s

OM2 and OTel v2 both output name as provided. The only
difference is OTel sets unit metadata from .unit().

Child issues

Implementation order

  1. Collision detection in PrometheusRegistry
  2. Smart-append in OM1/protobuf writers
  3. Remove reserved suffixes + store original name
  4. OTel: _total stripping (legacy) + preserve flag
  5. OM2 writer: no-op (name as provided)
  6. Move unit suffix appending to scrape time (OM1 only)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions