Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement UTF8 Support #916

Open
ywwg opened this issue Jan 22, 2024 · 6 comments
Open

Implement UTF8 Support #916

ywwg opened this issue Jan 22, 2024 · 6 comments
Assignees

Comments

@ywwg
Copy link
Member

ywwg commented Jan 22, 2024

Part of prometheus/prometheus#13095, all client libraries will need to support the new scraping, query, and content negotiation formats.

@ywwg
Copy link
Member Author

ywwg commented Jan 22, 2024

@fedetorres93

@fedetorres93
Copy link

I'll start by implementing UTF-8 support in the Java client library

@fstab
Copy link
Member

fstab commented Jan 23, 2024

@fedetorres93 thanks for volunteering, I really appreciate that!

Is there any general guidance yet on how to implement it, for example how to convert UTF-8 names to Prometheus names for older Prometheus servers, and how to deal with potential name collisions when registering metrics?

It would be good to define the behavior first before implementing it. Ideally the behavior would be consistent across client libraries in all programming languages.

@fedetorres93
Copy link

@fstab You can find the proposals @ywwg worked on here and here.

I'm working on adding UTF-8 metric and label name validations and support for parsing and formatting the UTF-8 text format, but there's still some discussion going on about the content negotiation implementation on writes and also regarding how the reads will be handled

@fstab
Copy link
Member

fstab commented Jan 30, 2024

Thanks @fedetorres93!

There is already support for dots in metric and label names in client_java. It will be easy to extend this to other characters. The motivation for allowing dots was to support metric/label names defined in the OpenTelemetry semantic conventions.

Currently dots are only exposed in OpenTelemetry format. In Prometheus text format, OpenMetrics text format, and OpenMetrics protobuf format dots are replaced with underscores.

I assume for UTF-8 characters in Prometheus format we will define a new OpenMetrics version, right?

I think the following two considerations make sense:

  • When converting OpenTelemetry names to Prometheus names follow the rules defined here: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/. These are the rules that are also implemented in the OpenTelemetry collector. For a user it should not matter whether they scrape Prometheus format, or whether they push OpenTelemetry format and have a collector convert to Prometheus remote write. The resulting metric and attribute names should be the same, therefore Prometheus client libraries should implement the OpenTelemetry standard for converting arbitrary names to Prometheus names.
  • Prometheus client libraries have a "fail fast" approach: When you register metrics with conflicting names, registration fails. We don't defer these errors to scrape time. I think we should look at the classic Prometheus names when checking for conflicts, i.e. we should fail if a user registers a metric named requests.total and then registers a metric named requests_total. While this might theoretically work when exposing new names only, it will fail at scrape time for older Prometheus servers. We should consider this bad practice and prevent this in our client libraries.

What do you think? If you feel we should have a small "client library support for UTF-8" proposal with the points above I'm happy to write one.

@fedetorres93
Copy link

Thanks for the info @fstab!

I don't think another proposal is necessary, but I appreciate the points you mentioned and will take them into account for the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants