[prometheus] "Error on ingesting out-of-order exemplars" message in logs #1795

puckpuck · 2024-11-26T04:46:47Z

Bug Report

The following error continues to show up in the Prometheus logs:

prometheus  | ts=2024-11-26T04:44:06.414Z caller=write_handler.go:279 level=warn component=web msg="Error on ingesting out-of-order exemplars" num_dropped=44

This error started happening after upgrading flagd to version 0.11.4 in this PR

The text was updated successfully, but these errors were encountered:

puckpuck · 2024-11-26T04:53:29Z

@beeme1mr This issue in particular did not exists with prior versions of flagd

I did an incremental change to 0.11.3, and saw this error stopped happening, so I suspect this is something very specific to the 0.11.3 -> 0.11.4 upgrade. The release notes for flagd, are not super obvious on why this error would occur outside of a prometheus client upgrade.

beeme1mr · 2024-11-26T14:02:52Z

We didn't make any telemetry-related changes in the last release but I'll look into it.

~~Are you sure this issue is caused by flagd? It's not clear from the attached logs.~~

Sorry, reread your comment.

beeme1mr · 2024-11-26T18:23:28Z

I can't get the demo app running locally due to an unrelated error.

 ⠋ Container otel-col          Creating                                                                                                                                                                      0.1s
Error response from daemon: invalid mount config: must use either propagation mode "rslave" or "rshared" when mount source is within the daemon root, daemon root: "/var/lib/docker", bind mount source: "/", propagation: "rprivate"
make: *** [Makefile:138: start] Error 1

Another user reported this already, but the fix has been reverted.

beeme1mr · 2024-11-26T18:30:17Z

Possibly related.

prometheus/prometheus#13933

julianocosta89 · 2024-11-26T20:14:01Z

I can't get the demo app running locally due to an unrelated error.

 ⠋ Container otel-col          Creating                                                                                                                                                                      0.1s
Error response from daemon: invalid mount config: must use either propagation mode "rslave" or "rshared" when mount source is within the daemon root, daemon root: "/var/lib/docker", bind mount source: "/", propagation: "rprivate"
make: *** [Makefile:138: start] Error 1

Another user reported this already, but the fix has been reverted.

@beeme1mr the latest release doesn't have that anymore.
Have you pulled the latest?

beeme1mr · 2024-11-26T21:14:59Z

Yeah, I'm trying to run the latest version of the demo in Ubuntu on WSL.

julianocosta89 · 2024-11-26T21:28:55Z

Ah, I see. Re-reading your message it actually makes sense.
We have removed the rslave param because it is not required anymore in the latest docker version.
It seemed to be an issue that happened in one single version.

Multiple users reported that they were facing issues to run with the rslave param.

Could you check if updating docker solves for you?
If not, maybe you could edit the docker-compose.yaml file locally.

But ideally the demo would run in all setups.

beeme1mr · 2024-11-26T21:35:26Z

I'm running the latest version available through apt-get. However, it's not the latest version according to the Docker release notes. I'll check again tomorrow.

dyladan · 2024-12-18T13:37:39Z

Looks like flagd 0.11.4 contained an updated of flagd/core to 0.10.3 which itself contained a change from 1.28.0 to 1.30.0 of the opentelemetry-go monorepo. This definitely seems like a possible cause of the issue. Not sure what all is in that change but it doesn't look like it is flagd's fault, since they're just using basic APIs and not doing anything overly fancy. Indeed, they're not doing anything specific to exemplars at all.

@open-telemetry/go-maintainers is there any chance there is a known issue which may have caused this?

dashpole · 2024-12-18T14:08:42Z

Exemplars were enabled by default in 1.31.0, so that wouldn't have changed between 1.28.0 and 1.30.0. But if it was 1.31, that would possibly explain it.

From https://github.com/prometheus/prometheus/blob/4a6f8704efcabfe9ee0f74eab58d4c11579547be/tsdb/exemplar.go#L257:

Since during the scrape the exemplars are sorted first by timestamp, then value, then labels,
if any of these conditions are true, we know that the exemplar is either a duplicate
of a previous one (but not the most recent one as that is checked above) or out of order.

So sounds like this could be out of order or a duplicate issue.

Are we sending OTLP to Prometheus? Or are we exporting prometheus or PRW from the collector?

puckpuck · 2024-12-19T03:27:55Z

Are we sending OTLP to Prometheus? Or are we exporting prometheus or PRW from the collector?

we are sending OTLP to Prometheus

dashpole · 2024-12-19T19:41:50Z

Got it. So it is probably an issue with the implementation of exemplar translation in the OTLP receiver of the prometheus server. The exemplar validation code probably assumes things about exemplars that aren't correct for OTel exemplars.

dashpole · 2024-12-19T19:47:54Z

@puckpuck if you can add details of our setup to prometheus/prometheus#13933, that would be helpful. Some hypotheses to check:

Is the issue triggered by counters with multiple exemplars (possibly with out-of-order timestamps?)
Is the issue triggered by histograms with exemplars that don't align with histogram bucket boundaries?
Is the issue triggered by exponential histograms with exemplars that aren't in timestamped order?

puckpuck added the bug Something isn't working label Nov 26, 2024

puckpuck mentioned this issue Dec 20, 2024

Out of order exemplars when sending metrics using OpenTelemetry spanmetrics connector prometheus/prometheus#13933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prometheus] "Error on ingesting out-of-order exemplars" message in logs #1795

[prometheus] "Error on ingesting out-of-order exemplars" message in logs #1795

puckpuck commented Nov 26, 2024 •

edited

Loading

puckpuck commented Nov 26, 2024 •

edited

Loading

beeme1mr commented Nov 26, 2024 •

edited

Loading

beeme1mr commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

julianocosta89 commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

julianocosta89 commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

dyladan commented Dec 18, 2024

dashpole commented Dec 18, 2024

puckpuck commented Dec 19, 2024

dashpole commented Dec 19, 2024

dashpole commented Dec 19, 2024

[prometheus] "Error on ingesting out-of-order exemplars" message in logs #1795

[prometheus] "Error on ingesting out-of-order exemplars" message in logs #1795

Comments

puckpuck commented Nov 26, 2024 • edited Loading

Bug Report

puckpuck commented Nov 26, 2024 • edited Loading

beeme1mr commented Nov 26, 2024 • edited Loading

beeme1mr commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

julianocosta89 commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

julianocosta89 commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

dyladan commented Dec 18, 2024

dashpole commented Dec 18, 2024

puckpuck commented Dec 19, 2024

dashpole commented Dec 19, 2024

dashpole commented Dec 19, 2024

puckpuck commented Nov 26, 2024 •

edited

Loading

puckpuck commented Nov 26, 2024 •

edited

Loading

beeme1mr commented Nov 26, 2024 •

edited

Loading