-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
A note for the community
**
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
Context
The Prometheus exporter sink has a TOCTOU (Time-of-Check-Time-of-Use) race condition when processing incremental metrics that can cause data corruption under concurrent load.
Problem
When processing incremental metrics, the exporter:
- Acquires a read lock to look up existing metric values and compute aggregations
- Releases the read lock
- Acquires a write lock to store the result
Between steps 2 and 3, other operations can occur: concurrent metric updates, metric expiration via flush timer, or HTTP scrapes.
Impact
Lost increments - Concurrent updates both read the same base value; the second write overwrites the first:
Thread A: reads counter=1000, computes 1000+50=1050
Thread B: reads counter=1000, computes 1000+30=1030
Thread A: writes 1050
Thread B: writes 1030 ← Thread A's increment is lost
Counter value decreases - As shown above, the counter drops from 1050 to 1030. This violates Prometheus monotonic counter semantics
and breaks rate() and increase() queries in PromQL, which assume counters only decrease on resets.
Stale aggregations - If a metric is expired by the flush timer between read and write, the new value incorrectly includes data that
should have been discarded.
Affected Code
src/sinks/prometheus/exporter.rs - normalize() function and run() method
Configuration
Version
v0.53.0
Debug Output
Example Data
No response
Additional Context
No response
References
No response