Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remote write protocol 2.0 support #9072

Open
Tracked by #7817
dimitarvdimitrov opened this issue Aug 22, 2024 · 3 comments
Open
Tracked by #7817

Add remote write protocol 2.0 support #9072

dimitarvdimitrov opened this issue Aug 22, 2024 · 3 comments

Comments

@dimitarvdimitrov
Copy link
Contributor

dimitarvdimitrov commented Aug 22, 2024

What should we do?

Prometheus added remote write protocol 2.0 experimental support in v2.54.0 (released on 2024-08-09). We should add the support in Mimir too.

How will we do it (roughly)?

  • Add remote write 2.0 support in Mimir distributors
  • Backport all (applicable) optimizations we did to remote write 1.0 (un)marshalling
  • Compare performance between remote write 1.0 and 2.0
  • Allow to enable 2.0 support on a per-tenant basis

Private design doc: https://docs.google.com/document/d/1JSwhdWRODOeGlNIRpYvnEHK6aH7d42n4ZJ_rfFvt-Lo/edit?tab=t.0#heading=h.5sybau7waq2q

Out of the scope of this work:

  • Change data format between distributors and ingesters / Kafka (keep using protocol 1.0 format). This should be a follow up deliverable.

Size?

Between Medium (= ~1 month) and Large (= ~3 month).

What will we deliver?

  • Add remote write 2.0 experimental support in Mimir, fully merged in Mimir but disabled by default
  • Test remote write 2.0 in dev

What are the documentation dependencies?

  • No documentation changes to docs until enabled by default

Urgency?

Not urgent yet, but we can't lag too much behind Prometheus

@jmichalek132
Copy link
Contributor

FYI otel collector contrib has lfx mentorship project starting in September to add support in the remote write exporter for remote write 2.0. Tracking issue: open-telemetry/opentelemetry-collector-contrib#33661.

@krajorama
Copy link
Contributor

This is becoming a bit more urgent now due to two projects depending on it: native histograms with custom buckets (NHCB) and proper handling of OTLP created timestamp (current solution we don't want to upstream because it's a bit of a workaround).

krajorama added a commit to prometheus/prometheus that referenced this issue Jan 14, 2025
While testing POC for grafana/mimir#9072
I saw no unit or help metadata. Our test env:
https://github.com/grafana/mimir/tree/main/development/mimir-monolithic-mode
doesn't have units, so that was empty and cleared the help due to this bug.

Signed-off-by: György Krajcsovits <[email protected]>
krajorama added a commit to prometheus/prometheus that referenced this issue Jan 15, 2025
…tadata

Found during testing for
grafana/mimir#9072

Debug printout showed:
KRAJO: seriesName=cortex_request_duration_seconds_bucket,
metricFamily=cortex_request_duration_seconds_bucket,
type=GAUGE,
help=cortex_bucket_index_load_duration_seconds_sum,
unit=

which is nonsense.

I can imagine more cases where this is the case and makes actual sense.
Some targets might miss metadata and if there's a pipeline that loses it.

Signed-off-by: György Krajcsovits <[email protected]>
krajorama added a commit that referenced this issue Jan 16, 2025
As far as I can tell we don't cast Prometheus Remote Write 1.0
histogram into mimirpb.Histogram anymore. On the flip-side
this test fails in #10432 because we're going to store RW 2.0
extra field in mimirpb.Histogram.

Related to #9072

Signed-off-by: György Krajcsovits <[email protected]>
krajorama added a commit that referenced this issue Jan 16, 2025
As far as I can tell we don't cast Prometheus Remote Write 1.0
histogram into mimirpb.Histogram anymore. On the flip-side
this test fails in #10432 because we're going to store RW 2.0
extra field in mimirpb.Histogram.

Related to #9072

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama
Copy link
Contributor

Based on #10432

Task list:

  • Remove/refactor debug statements in PreallocWriteRequest.unmarshalRW2
  • Add unit test for PreallocWriteRequest.unmarshalRW2
  • Add benchmark test for PreallocWriteRequest.unmarshalRW2
  • Add configuration option to enable/disable RW2.0 support. If disabled it should return 415 as in fix(rw2.0): reject remote write 2.0 based on content type #10423
  • Add metric to measure which protocol is in use
  • Add more testcases to TestDistributorRemoteWrite2 integration test
  • Patch mimir.pb.go to ignore RW2 only fields when unmarshal RW1
  • Add Changelog entry
  • Run Prometheus tool that checks compatibility against out endpoint if possible
  • Performance test with avalanche

Vandit1604 pushed a commit to Vandit1604/prometheus that referenced this issue Jan 16, 2025
While testing POC for grafana/mimir#9072
I saw no unit or help metadata. Our test env:
https://github.com/grafana/mimir/tree/main/development/mimir-monolithic-mode
doesn't have units, so that was empty and cleared the help due to this bug.

Signed-off-by: György Krajcsovits <[email protected]>
Vandit1604 pushed a commit to Vandit1604/prometheus that referenced this issue Jan 16, 2025
…tadata

Found during testing for
grafana/mimir#9072

Debug printout showed:
KRAJO: seriesName=cortex_request_duration_seconds_bucket,
metricFamily=cortex_request_duration_seconds_bucket,
type=GAUGE,
help=cortex_bucket_index_load_duration_seconds_sum,
unit=

which is nonsense.

I can imagine more cases where this is the case and makes actual sense.
Some targets might miss metadata and if there's a pipeline that loses it.

Signed-off-by: György Krajcsovits <[email protected]>
krajorama added a commit that referenced this issue Jan 17, 2025
As far as I can tell we don't cast Prometheus Remote Write 1.0
histogram into mimirpb.Histogram anymore. On the flip-side
this test fails in #10432 because we're going to store RW 2.0
extra field in mimirpb.Histogram.

Related to #9072

Signed-off-by: György Krajcsovits <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants