-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQE: Fix panic in loading too many samples #10261
Conversation
BucketedPool would create pools in powers of two up to a set `maxSize`. That is, if `maxSize` isn't a power of two itself, then the maximum bucket would be less than the maxSize. However, when requesting a slice from the pool we were only checking against the maxSize, and not whether a bucket existed for that size. Instead calculate the bucketIndex and check if that exists before using it.
This is to guarantee they work with the ring buffers which expect slices to always be power of two. The limiting pool still protects us from requesting an unreasonable amount of points with the MemoryConsumptionTracker.
(this will need backporting to r321) |
* MQE: Fix panic in loading too many samples BucketedPool would create pools in powers of two up to a set `maxSize`. That is, if `maxSize` isn't a power of two itself, then the maximum bucket would be less than the maxSize. However, when requesting a slice from the pool we were only checking against the maxSize, and not whether a bucket existed for that size. Instead calculate the bucketIndex and check if that exists before using it. * Use a power of two size max bucket * MQE: Ensure BucketedPool always returns slices of power two This is to guarantee they work with the ring buffers which expect slices to always be power of two. The limiting pool still protects us from requesting an unreasonable amount of points with the MemoryConsumptionTracker. * Fix tests * Correctly return slices to their respective buckets or discard them * Extra tests * Address review feedback * Remove unncessary slice length check (cherry picked from commit 34a24b1) Co-authored-by: Joshua Hesketh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding and fixing these issues!
@@ -13,7 +13,7 @@ import ( | |||
) | |||
|
|||
const ( | |||
MaxExpectedPointsPerSeries = 100_000 // There's not too much science behind this number: 100000 points allows for a point per minute for just under 70 days. | |||
MaxExpectedPointsPerSeries = 131_072 // There's not too much science behind this number: 100,000 points allows for a point per minute for just under 70 days. Then we use the next power of two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Might be good to clarify why we want a power of two here, for example:
MaxExpectedPointsPerSeries = 131_072 // There's not too much science behind this number: 100,000 points allows for a point per minute for just under 70 days. Then we use the next power of two. | |
MaxExpectedPointsPerSeries = 131_072 // There's not too much science behind this number: 100,000 points allows for a point per minute for just under 70 days. Then we use the next power of two, given the pools always return slices with capacity equal to a power of two. |
func TestLimitingBucketedPool_PowerOfTwoCapacities(t *testing.T) { | ||
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(0, nil) | ||
|
||
pool := NewLimitingBucketedPool( | ||
pool.NewBucketedPool(100_000, func(size int) []int { return make([]int, 0, size) }), | ||
1, | ||
false, | ||
nil, | ||
) | ||
|
||
cases := []struct { | ||
requestedSize int | ||
expectedCap int | ||
}{ | ||
{3, 4}, | ||
{5, 8}, | ||
{10, 16}, | ||
{65_000, 65_536}, | ||
{100_001, 131_072}, // Exceeds max, expect next power of two | ||
} | ||
|
||
for _, c := range cases { | ||
slice, err := pool.Get(c.requestedSize, memoryConsumptionTracker) | ||
require.NoError(t, err, "Unexpected error when requesting size %d", c.requestedSize) | ||
require.Equal(t, c.expectedCap, cap(slice), | ||
"LimitingBucketedPool.Get() returned slice with capacity %d; expected %d", cap(slice), c.expectedCap) | ||
pool.Put(slice, memoryConsumptionTracker) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing functionality entirely in BucketedPool
- should this test move to util/pool/bucketed_pool_test.go
?
func TestLimitingBucketedPool_UnreasonableSizeRequest(t *testing.T) { | ||
const maxMemoryLimit = 1_000_000 * FPointSize | ||
|
||
reg, metric := createRejectedMetric() | ||
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(uint64(maxMemoryLimit), metric) | ||
|
||
pool := NewLimitingBucketedPool( | ||
pool.NewBucketedPool(100_000, func(size int) []int { return make([]int, 0, size) }), | ||
1, | ||
false, | ||
nil, | ||
) | ||
|
||
// Request a reasonable size | ||
slice, err := pool.Get(500_000, memoryConsumptionTracker) | ||
require.NoError(t, err, "Expected to succeed for reasonable size request") | ||
require.Equal(t, 524_288, cap(slice), "Capacity should be next power of two") | ||
assertRejectedQueryCount(t, reg, 0) | ||
|
||
pool.Put(slice, memoryConsumptionTracker) | ||
|
||
// Request an unreasonable size | ||
_, err = pool.Get(10_000_000, memoryConsumptionTracker) | ||
require.Error(t, err, "Expected an error for unreasonably large size request") | ||
require.Contains(t, err.Error(), "exceeded", "Error message should indicate memory consumption limit exceeded") | ||
assertRejectedQueryCount(t, reg, 1) | ||
|
||
require.Equal(t, uint64(0), memoryConsumptionTracker.CurrentEstimatedMemoryConsumptionBytes, | ||
"Current memory consumption should remain at 0 after rejected request") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test testing something beyond what TestLimitingPool_Limited
already covers? If so, a comment explaining the difference would be helpful.
func TestFPointRingBuffer_UseReturnsErrorOnNonPowerOfTwoSlice(t *testing.T) { | ||
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(0, nil) | ||
buf := NewFPointRingBuffer(memoryConsumptionTracker) | ||
|
||
nonPowerOfTwoSlice := make([]promql.FPoint, 0, 15) | ||
|
||
err := buf.Use(nonPowerOfTwoSlice) | ||
require.Error(t, err, "Use() should return an error for a non-power-of-two slice") | ||
require.EqualError(t, err, "slice capacity must be a power of two, but is 15", | ||
"Error message should indicate the invalid capacity") | ||
} | ||
|
||
func TestHPointRingBuffer_UseReturnsErrorOnNonPowerOfTwoSlice(t *testing.T) { | ||
memoryConsumptionTracker := limiting.NewMemoryConsumptionTracker(0, nil) | ||
buf := NewHPointRingBuffer(memoryConsumptionTracker) | ||
|
||
nonPowerOfTwoSlice := make([]promql.HPoint, 0, 15) | ||
|
||
err := buf.Use(nonPowerOfTwoSlice) | ||
require.Error(t, err, "Use() should return an error for a non-power-of-two slice") | ||
require.EqualError(t, err, "slice capacity must be a power of two, but is 15", | ||
"Error message should indicate the invalid capacity") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be incorporated into testRingBuffer
?
if bucketIndex >= len(p.buckets) { | ||
return // Ignore slices larger than the largest bucket | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, should we require that maxSize
is a power of two in NewBucketedPool
? I think that would be easier to reason about and less confusing for someone using this type.
For example, imagine someone created a pool with maxSize
set to 1000. However, only slices with capacity up to 512 will be pooled, which is likely not what they intended.
// If no bucket large enough exists, a slice larger than the requested size | ||
// of the next power of two is returned. | ||
// Get guarantees the resulting slice always has a capacity in power of twos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a little clearer:
// If no bucket large enough exists, a slice larger than the requested size | |
// of the next power of two is returned. | |
// Get guarantees the resulting slice always has a capacity in power of twos. | |
// The resulting slice always has a capacity that is a power of two. | |
// If size is greater than maxSize, then a slice is still returned, however it may not be drawn from a pool. |
maxSize := 100000 | ||
pool := NewBucketedPool(uint(maxSize), makeFunc) | ||
|
||
// Request a size that triggers the last bucket boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
// Request a size that triggers the last bucket boundary. | |
// Request a slice with size that will be drawn from the last bucket in the pool. |
require.Len(t, s, 0) | ||
} | ||
|
||
func TestBucketedPool_AlwaysReturnsPowerOfTwoCapacities(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this covered by TestBucketedPool_HappyPath
?
// Create a slice with capacity that triggers the upper edge case | ||
s := make([]int, 0, 65_000) // 86401 is close to maxSize but not aligned to power of 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two comments aren't clear to me: it's not clear why 86401 is mentioned in the second comment, and "upper edge case" isn't clear either. If this is intended to test the case where we're putting a slice into the last bucket in the pool and maxSize
is not a power of two, then perhaps something along those lines would be clearer?
return p.make(size) | ||
bucketIndex := bits.Len(uint(size - 1)) | ||
|
||
// If bucketIndex exceeds the number of available buckets, return a slice of the next power of two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
// If bucketIndex exceeds the number of available buckets, return a slice of the next power of two. | |
// If the requested size is larger than the size of the largest bucket, return a slice of the next power of two greater than or equal to size. |
What this PR does
This PR solves two bugs:
Bug 1:
BucketedPool would create pools in powers of two up to a set
maxSize
. That is, ifmaxSize
isn't a power of two itself,then the maximum bucket would be less than the maxSize.
However, when requesting a slice from the pool we were only checking against the maxSize, and not whether a bucket existed for that size. Now, instead, calculate the bucketIndex and check if that exists before using it.
This would cause panic's in range vectors of more than 65,536 samples.
This problem was introduced in #10199
Bug 2:
If a requested slice size exceeded the size of the pools it would just be returned as the requested size. This caused a bug in the buffers which require the slices to be powers of two.
This PR ensures that the returned slice is always of power two. This means we will return a slice greater than the
MaxExpectedPointsPerSeries
, but it won't be from a pool (and similarly won't be returned to a pool). We are protected by theMemoryConsumptionTracker
for any unreasonable requests.This was introduced in #9664
Originally this would cause panic's in sub-queries selecting more than 65,536 samples, but since #10199 it returns just errors.
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.