Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQE: Fix panic in loading too many samples #10261

Merged
merged 8 commits into from
Dec 18, 2024

Conversation

jhesketh
Copy link
Contributor

@jhesketh jhesketh commented Dec 17, 2024

What this PR does

This PR solves two bugs:

Bug 1:

BucketedPool would create pools in powers of two up to a set maxSize. That is, if maxSize isn't a power of two itself,
then the maximum bucket would be less than the maxSize.

However, when requesting a slice from the pool we were only checking against the maxSize, and not whether a bucket existed for that size. Now, instead, calculate the bucketIndex and check if that exists before using it.

This would cause panic's in range vectors of more than 65,536 samples.

This problem was introduced in #10199

Bug 2:

If a requested slice size exceeded the size of the pools it would just be returned as the requested size. This caused a bug in the buffers which require the slices to be powers of two.

This PR ensures that the returned slice is always of power two. This means we will return a slice greater than the MaxExpectedPointsPerSeries, but it won't be from a pool (and similarly won't be returned to a pool). We are protected by the MemoryConsumptionTracker for any unreasonable requests.

This was introduced in #9664

Originally this would cause panic's in sub-queries selecting more than 65,536 samples, but since #10199 it returns just errors.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

BucketedPool would create pools in powers of two up to a set
`maxSize`. That is, if `maxSize` isn't a power of two itself,
then the maximum bucket would be less than the maxSize.

However, when requesting a slice from the pool we were only checking
against the maxSize, and not whether a bucket existed for that size.
Instead calculate the bucketIndex and check if that exists
before using it.
This is to guarantee they work with the ring buffers which expect slices
to always be power of two.

The limiting pool still protects us from requesting an unreasonable
amount of points with the MemoryConsumptionTracker.
@jhesketh jhesketh requested a review from a team as a code owner December 17, 2024 09:28
@jhesketh jhesketh mentioned this pull request Dec 17, 2024
@jhesketh
Copy link
Contributor Author

(this will need backporting to r321)

pkg/util/pool/bucketed_pool.go Show resolved Hide resolved
pkg/util/pool/bucketed_pool.go Show resolved Hide resolved
@jhesketh jhesketh merged commit 34a24b1 into grafana:main Dec 18, 2024
29 checks passed
@jhesketh jhesketh deleted the jhesketh/mqe-pooling branch December 18, 2024 09:52
jhesketh added a commit that referenced this pull request Dec 18, 2024
* MQE: Fix panic in loading too many samples

BucketedPool would create pools in powers of two up to a set
`maxSize`. That is, if `maxSize` isn't a power of two itself,
then the maximum bucket would be less than the maxSize.

However, when requesting a slice from the pool we were only checking
against the maxSize, and not whether a bucket existed for that size.
Instead calculate the bucketIndex and check if that exists
before using it.

* Use a power of two size max bucket

* MQE: Ensure BucketedPool always returns slices of power two

This is to guarantee they work with the ring buffers which expect slices
to always be power of two.

The limiting pool still protects us from requesting an unreasonable
amount of points with the MemoryConsumptionTracker.

* Fix tests

* Correctly return slices to their respective buckets or discard them

* Extra tests

* Address review feedback

* Remove unncessary slice length check

(cherry picked from commit 34a24b1)

Co-authored-by: Joshua Hesketh <[email protected]>
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding and fixing these issues!

@@ -13,7 +13,7 @@ import (
)

const (
MaxExpectedPointsPerSeries = 100_000 // There's not too much science behind this number: 100000 points allows for a point per minute for just under 70 days.
MaxExpectedPointsPerSeries = 131_072 // There's not too much science behind this number: 100,000 points allows for a point per minute for just under 70 days. Then we use the next power of two.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Might be good to clarify why we want a power of two here, for example:

Suggested change
MaxExpectedPointsPerSeries = 131_072 // There's not too much science behind this number: 100,000 points allows for a point per minute for just under 70 days. Then we use the next power of two.
MaxExpectedPointsPerSeries = 131_072 // There's not too much science behind this number: 100,000 points allows for a point per minute for just under 70 days. Then we use the next power of two, given the pools always return slices with capacity equal to a power of two.

Comment on lines +231 to +259
func TestLimitingBucketedPool_PowerOfTwoCapacities(t *testing.T) {
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(0, nil)

pool := NewLimitingBucketedPool(
pool.NewBucketedPool(100_000, func(size int) []int { return make([]int, 0, size) }),
1,
false,
nil,
)

cases := []struct {
requestedSize int
expectedCap int
}{
{3, 4},
{5, 8},
{10, 16},
{65_000, 65_536},
{100_001, 131_072}, // Exceeds max, expect next power of two
}

for _, c := range cases {
slice, err := pool.Get(c.requestedSize, memoryConsumptionTracker)
require.NoError(t, err, "Unexpected error when requesting size %d", c.requestedSize)
require.Equal(t, c.expectedCap, cap(slice),
"LimitingBucketedPool.Get() returned slice with capacity %d; expected %d", cap(slice), c.expectedCap)
pool.Put(slice, memoryConsumptionTracker)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing functionality entirely in BucketedPool - should this test move to util/pool/bucketed_pool_test.go?

Comment on lines +261 to +290
func TestLimitingBucketedPool_UnreasonableSizeRequest(t *testing.T) {
const maxMemoryLimit = 1_000_000 * FPointSize

reg, metric := createRejectedMetric()
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(uint64(maxMemoryLimit), metric)

pool := NewLimitingBucketedPool(
pool.NewBucketedPool(100_000, func(size int) []int { return make([]int, 0, size) }),
1,
false,
nil,
)

// Request a reasonable size
slice, err := pool.Get(500_000, memoryConsumptionTracker)
require.NoError(t, err, "Expected to succeed for reasonable size request")
require.Equal(t, 524_288, cap(slice), "Capacity should be next power of two")
assertRejectedQueryCount(t, reg, 0)

pool.Put(slice, memoryConsumptionTracker)

// Request an unreasonable size
_, err = pool.Get(10_000_000, memoryConsumptionTracker)
require.Error(t, err, "Expected an error for unreasonably large size request")
require.Contains(t, err.Error(), "exceeded", "Error message should indicate memory consumption limit exceeded")
assertRejectedQueryCount(t, reg, 1)

require.Equal(t, uint64(0), memoryConsumptionTracker.CurrentEstimatedMemoryConsumptionBytes,
"Current memory consumption should remain at 0 after rejected request")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test testing something beyond what TestLimitingPool_Limited already covers? If so, a comment explaining the difference would be helpful.

Comment on lines +506 to +528
func TestFPointRingBuffer_UseReturnsErrorOnNonPowerOfTwoSlice(t *testing.T) {
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(0, nil)
buf := NewFPointRingBuffer(memoryConsumptionTracker)

nonPowerOfTwoSlice := make([]promql.FPoint, 0, 15)

err := buf.Use(nonPowerOfTwoSlice)
require.Error(t, err, "Use() should return an error for a non-power-of-two slice")
require.EqualError(t, err, "slice capacity must be a power of two, but is 15",
"Error message should indicate the invalid capacity")
}

func TestHPointRingBuffer_UseReturnsErrorOnNonPowerOfTwoSlice(t *testing.T) {
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(0, nil)
buf := NewHPointRingBuffer(memoryConsumptionTracker)

nonPowerOfTwoSlice := make([]promql.HPoint, 0, 15)

err := buf.Use(nonPowerOfTwoSlice)
require.Error(t, err, "Use() should return an error for a non-power-of-two slice")
require.EqualError(t, err, "slice capacity must be a power of two, but is 15",
"Error message should indicate the invalid capacity")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be incorporated into testRingBuffer?

Comment on lines +85 to +87
if bucketIndex >= len(p.buckets) {
return // Ignore slices larger than the largest bucket
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, should we require that maxSize is a power of two in NewBucketedPool? I think that would be easier to reason about and less confusing for someone using this type.

For example, imagine someone created a pool with maxSize set to 1000. However, only slices with capacity up to 512 will be pooled, which is likely not what they intended.

Comment on lines +45 to +47
// If no bucket large enough exists, a slice larger than the requested size
// of the next power of two is returned.
// Get guarantees the resulting slice always has a capacity in power of twos.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a little clearer:

Suggested change
// If no bucket large enough exists, a slice larger than the requested size
// of the next power of two is returned.
// Get guarantees the resulting slice always has a capacity in power of twos.
// The resulting slice always has a capacity that is a power of two.
// If size is greater than maxSize, then a slice is still returned, however it may not be drawn from a pool.

maxSize := 100000
pool := NewBucketedPool(uint(maxSize), makeFunc)

// Request a size that triggers the last bucket boundary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]

Suggested change
// Request a size that triggers the last bucket boundary.
// Request a slice with size that will be drawn from the last bucket in the pool.

require.Len(t, s, 0)
}

func TestBucketedPool_AlwaysReturnsPowerOfTwoCapacities(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this covered by TestBucketedPool_HappyPath?

Comment on lines +167 to +168
// Create a slice with capacity that triggers the upper edge case
s := make([]int, 0, 65_000) // 86401 is close to maxSize but not aligned to power of 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two comments aren't clear to me: it's not clear why 86401 is mentioned in the second comment, and "upper edge case" isn't clear either. If this is intended to test the case where we're putting a slice into the last bucket in the pool and maxSize is not a power of two, then perhaps something along those lines would be clearer?

return p.make(size)
bucketIndex := bits.Len(uint(size - 1))

// If bucketIndex exceeds the number of available buckets, return a slice of the next power of two.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]

Suggested change
// If bucketIndex exceeds the number of available buckets, return a slice of the next power of two.
// If the requested size is larger than the size of the largest bucket, return a slice of the next power of two greater than or equal to size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants