Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add very small delay between observations in TestHistogramAtomicObserve #1691

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

imorph
Copy link
Contributor

@imorph imorph commented Nov 20, 2024

Problem

Test TestHistogramAtomicObserve taking too long to complete after introducing this change:

=== RUN   TestHistogramAtomicObserve
--- PASS: TestHistogramAtomicObserve (487.82s)

Analysis

@ywwg pointed out that increase for this test runtime were introduced after adding exponential back-off to histograms.
If we will take a look at this test closely we will see that observations here

for {
	select {
	case <-quit:
		return
	default:
		his.Observe(1)
	}
}

introduced in a very high pace inside tight loop without any delay or limit.

  • With multiple goroutines failing CAS operations and backing off, they sleep for significant amounts of time.
  • This leads to longer wait times for the waitForCooldown function in the Write method, as observations are delayed.
  • Previous implementation used a CAS loop without sleeping even though CAS failures occurred, goroutines would retry immediately, eventually succeeding.
  • This kept the observations progressing quickly, and the test ran faster. The addition of sleep delays introduces significant latency during high contention.

Solution

Under typical workloads, contention is usually not as extreme as in this test. Observations has at least some delays between them. So proposal is to alter test by adding nanosecond sleep to it, which significantly reduce runtime and makes it a little bit closer to what happens typically in production.

=== RUN   TestHistogramAtomicObserve
--- PASS: TestHistogramAtomicObserve (0.76s)

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks for swift reaction. And thanks @ywwg for discovering and reporting it.

@kakkoyun kakkoyun merged commit 76b74e2 into prometheus:main Nov 20, 2024
10 checks passed
@ywwg
Copy link
Member

ywwg commented Nov 20, 2024

thank you for the fast fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants