-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd compression support between distributors and ingesters #8522
zstd compression support between distributors and ingesters #8522
Comments
Are you specifically asking for the |
I'm specifically looking for compression support on the ingester-client, yes. That's our most expensive client. Though, judging by how the config is laid out, I suspect it is a shared compressor for all the gRPC clients? |
In effect yes. I just needed more clarification, as your description was relatively vague. I'm trying to see whether the team has any opinion on supporting zstd. |
So, I found out that the Mimir team already has considered zstd support, but the zstd implementation we considered is apparently not very performant. Is that the same zstd implementation as you're referring to? |
Thanks for checking in. I'm looking at https://github.com/klauspost/compress/tree/master/zstd as a possible zstd implementation. It looks like what you linked is a wrapper on top of klauspost/compress. What were the reasons why it was not performant? I'm surprised to hear it was not compared to the other offering, gzip. Surely it strikes in the middle of gzip and snappy but with a much higher compression ratio. |
@aknuds1, checking in - are you able to share more on the reasons why that implementation was not performant? |
I just know that @bboreham considered the aforementioned zstd implementation, and found performance problems with it. When we discussed trying another compression algorithm for cheaper AZ traffic internally, the other point made was it wasn't worth all the testing that comes along with it. |
I have some new context to share to shed hopefully enough light on why the Mimir team doesn't think the investment into a new compression algorithm for distributor -> ingester traffic is going to be worthwhile. I don't know if you're perhaps already familiar with this, but Mimir is moving towards a fundamentally revamped architecture, where distributors will no longer be writing directly to ingesters, but instead to a Kafka compatible back-end, which ingesters again read from. Please see [FEATURE] Experimental Kafka-based ingest storage in the changelog for reference. |
I remarked on the amount of memory allocations it does. Someone else commented mostynb/go-grpc-compression#25 |
I decided to experiment with zstd and s2 GRPC compression from https://github.com/mostynb/go-grpc-compression. Then I ran this patched version in our test cluster, first with zstd compression, then s2. Please note, that before test test cluster already run Mimir 2.13.0 with Snappy GRPC compression enabled, so, comparison below is to snappy and NOT uncopressed: Results: Zstd results comparing to Snappy:
So, results are good, but memory usage is tremendous. I checked heap, majority of memory consumed by But I also have a good news too, when I tried to switch to s2 compression, which is better version of snappy. S2 results comparing to Snappy:
(left is Snappy, middle - zstd, right part - S2) So, after that I decided to drop zstd (first I wanted to test Datadog or another wrapper but then I realized that Mimir has no CGO enabled) and S2 looks really promising. I know that Mimir architecture is revamped and will be migrated to Kafka, but it will take some time to implement that properly - and S2 patch can shave 50% of cross AZ data cost right now. And I don't like run patched version and prefer to port my changes to upstream. I can clean up and submit PR for experimental S2 support in dskit. Should I or no sense? |
@deniszh, thanks so much for trying it out and posting your results. The memory increase seems to match up with what @bboreham said earlier in his findings too. Thanks for also trying out s2. The results seem much more favorable and it would be something I can also help drive in order to get this ported upstream. Do you have a PR available on the dskit side for s2? |
@aallawala : grafana/dskit#582 But please take note that latest mimir is not ready to latest dskit. So, if you want to test you can build mimir from this branch https://github.com/deniszh/mimir/tree/2.13.0-grpc-s2 |
Tried S2 on bigger cluster - it's less impressive but I see 70MB/s instead of 105MB/s - 30% decrease. |
Thanks for your testing @deniszh! The results look very promising. I'm asking the Mimir team what they think. |
@deniszh the Mimir team is positive to using S2 for gRPC compression in Mimir. I guess we should start out by reviewing your dskit PR. |
👍 |
Will be S2 better than gzip 🤔 compression? |
I didn't compare to gzip. Maybe by ratio it is, but not by resources and speed. |
FWIW I am considering a change to the zstd wrapper in mostynb/go-grpc-compression#34 |
Ok, I decided to tried it one more time. First, I applied @mostynb patch above, but for some reason I got panic somewhere in ztsd.go when calling close() in my tests.
(65MB S2 vs 54MB ZSTD - 17% better)
(300% worse for ZSTD, similar effect is for mimir-backend) Conclusion:
|
Is your feature request related to a problem? Please describe.
Are there plans for zstd support between distributors and ingesters using the
grpc_compression
flag?Enabling zstd compression would help significantly reduce the amount of cross-AZ traffic our org pays for running Mimir. gzip gets some wins, at the expense of CPU.
Now that there is a pure Go zstd implementation, are there any major blockers for introducing it to Mimir?
Describe the solution you'd like
Support for
zstd
when using thegrpc_compression
.A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: