Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Upgrade google.golang.org/grpc to v1.69.0 #10224

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Dec 12, 2024

What this PR does

TODO:
  • Try using memory pooling when unmarshalling storepb.SeriesResponse in the querier, since it currently leads to huge memory utilization bursts.
  • Fix memory usage in distributor.receiveResponse (QueryStreamResponse unmarshalling, stop holding on to unmarshalling buffer)

Which issue(s) this PR fixes or relates to

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 5 times, most recently from 7f962d6 to 5fb5473 Compare December 13, 2024 07:54
@aknuds1 aknuds1 changed the title WIP: Upgrade google.golang.org/grpc to v1.68.1 WIP: Upgrade google.golang.org/grpc to v1.69.0 Dec 13, 2024
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 3 times, most recently from 321bc61 to 6be7edd Compare December 17, 2024 11:39
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 14 times, most recently from 6ba3edf to ffd51d0 Compare January 1, 2025 18:45
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 7 times, most recently from 2c7a283 to bfb9512 Compare January 5, 2025 17:24
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch from bfb9512 to b582568 Compare January 6, 2025 07:53
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 4 times, most recently from f9b89fc to a79ef47 Compare January 6, 2025 17:09
@@ -365,8 +329,8 @@ func (d *Distributor) queryIngesterStream(ctx context.Context, replicationSets [
queryMetrics.IngesterChunksTotal.Add(float64(totalChunks))
}()

hashToChunkseries := map[string]ingester_client.TimeSeriesChunk{}
hashToTimeSeries := map[string]mimirpb.TimeSeries{}
hashToChunkseries := map[string]ingester_client.CustomTimeSeriesChunk{}
Copy link
Member

@francoposa francoposa Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked at test TestDistributor_QueryStream_ShouldReturnErrorIfMaxSeriesPerQueryLimitIsReached since it had a nil pointer dereference, - lot of the other failures are just a bad equality assert on one of the new struct types.

This new
hashToChunkseries map gets read from before it's written to, without a nil guard, so there's a panic in line 341

key := mimirpb.FromLabelAdaptersToKeyString(series.Labels)
existing := hashToChunkseries[key]
existing.Labels = series.Labels   // panic: existing is nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just fixed failures in pkg/distributor/query_test.go. I hadn't noticed that segfault though. I'll try to fix it tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this patch fixes that test, but it looks like there's the same bug below it for the hashToTimeSeries that is not getting caught by a test.

diff --git a/pkg/distributor/query.go b/pkg/distributor/query.go
index 2b766f30f4..41d83b124f 100644
--- a/pkg/distributor/query.go
+++ b/pkg/distributor/query.go
@@ -337,15 +337,20 @@ func (d *Distributor) queryIngesterStream(ctx context.Context, replicationSets [
 		for _, batch := range res.chunkseriesBatches {
 			for _, series := range batch {
 				key := mimirpb.FromLabelAdaptersToKeyString(series.Labels)
-				existing := hashToChunkseries[key]
-				existing.Labels = series.Labels
+				if existing, ok := hashToChunkseries[key]; ok {
+					existing.Labels = series.Labels
 
-				numPotentialChunks := len(existing.Chunks) + len(series.Chunks)
-				existing.Chunks = ingester_client.AccumulateChunks(existing.Chunks, series.Chunks)
+					numPotentialChunks := len(existing.Chunks) + len(series.Chunks)
+					existing.Chunks = ingester_client.AccumulateChunks(existing.Chunks, series.Chunks)
 
-				deduplicatedChunks += numPotentialChunks - len(existing.Chunks)
-				totalChunks += len(series.Chunks)
-				hashToChunkseries[key] = existing
+					deduplicatedChunks += numPotentialChunks - len(existing.Chunks)
+					totalChunks += len(series.Chunks)
+					hashToChunkseries[key] = existing
+				} else {
+					// no existing series to dedupe
+					totalChunks += len(series.Chunks)
+					hashToChunkseries[key] = series
+				}
 			}
 		}

Copy link
Contributor Author

@aknuds1 aknuds1 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a simple fix, that just reflects the change in the PR (i.e. that client.TimeSeriesChunk is wrapped by client.CustomTimeSeriesChunk). The problem is that the client.CustomTimeSeriesChunk isn't valid by itself (unlike client.TimeSeriesChunk), you need to initialize it with a non-nil TimeSeriesChunk field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still think there's a bug not getting caught?

@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch 3 times, most recently from e92b7e1 to 13c99fe Compare January 7, 2025 07:59
@aknuds1 aknuds1 force-pushed the arve/upgrade-grpc branch from 13c99fe to 87c6282 Compare January 7, 2025 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants