Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nhcb: store custom buckets in WAL, WBL #14730

Open
Tracked by #13485
krajorama opened this issue Aug 26, 2024 · 15 comments
Open
Tracked by #13485

nhcb: store custom buckets in WAL, WBL #14730

krajorama opened this issue Aug 26, 2024 · 15 comments
Assignees

Comments

@krajorama
Copy link
Member

krajorama commented Aug 26, 2024

Proposal

Related to prometheus/proposals#31

WAL, WBL records need to be able to encode the native histogram custom values.

Alternative 1:

Encode custom values into every histogram record.

Alternative 2:

This is the preferred method for implementation. #14730 (comment)

Introduce new record type that has a series reference and the custom values. Only record once every (segment?).

@bwplotka
Copy link
Member

Can't we use native histogram chunk format as a first step? I guess that's the alternative 1, right?

@krajorama
Copy link
Member Author

Can't we use native histogram chunk format as a first step? I guess that's the alternative 1, right?

Yes. I don't know if it's a good idea, haven't thought it through. I'm worried that storing the custom values for every histogram individually is a huge waste of space. Similar to how we don't store the labels in every sample, but once per series:

Labels labels.Labels

On the other hand adding a new RefCustomValues record type like

type RefCustomValues struct {
	Ref chunks.HeadSeriesRef
	V  []float64
}

is probably more complicated?

@krajorama
Copy link
Member Author

@bwplotka We've discussed this question in the histograms sync meeting. It was pointed out that we already keep the bucket definitions separate from the values, since the bucket definition is a label le in the classic case and labels have their own record, separate from the value. I'm convinced now that we should have a separate record for the custom values. Updating the issue.

@bwplotka
Copy link
Member

That's separate discussion than chunk format, or is it similar decision?

@krajorama
Copy link
Member Author

That's separate discussion than chunk format, or is it similar decision?

Separate. Reusing the same formatting as in the chunks should be fine. We can always add new kind of record if we need to.

@bwplotka
Copy link
Member

@bwplotka We've discussed this question in the histograms sync meeting. It was pointed out that we already keep the bucket definitions separate from the values, since the bucket definition is a label le in the classic case and labels have their own record, separate from the value. I'm convinced now that we should have a separate record for the custom values. Updating the issue.

Hm... I would challenge this. It's true we move bucketing from label + sample to complex sample. But:

  1. We do the same for exponential native histograms, yet we didn't decide to have a separate histogram sample record and separate histogram spans/buckets/schema record.
  2. So far looking on the @carrieedwards first attempt adding another record adds a lot of complexity to code and format vs existing histogram types referenced by refs literally have custom buckets.
  3. AFAIK, there is currently no logic for NOT adding recording if it does not change or something, so I don't see how separation is cheaper here no both storage and encoding/decoding. I would argue it's much more expensive actually to process separate buckets.

To sum up, I would not try to prematurely optimize with a new pattern, especially as it's not easy to add/maintain - but instead do same as native histograms are doing for now, especially as NH literally have CustomValues slice in it. Then if profile and benchmarks suggest, we can try to be more smart and split/only add new record when it's needed or something.

Started some attempt on top of @carrieedwards code here: main...bwplotka/nhcb-wal-wbl

The main "counter argument" is backward compatibility. I don't see any versioning for WAL records (?), so this will require a new Histogram and float histogram record (with versioning added ideally for future use). This however can and should be FULLY abstracted by inside Re(Float)HistogramSample so little complexity for appenders and head.

@bwplotka
Copy link
Member

Related #15200

@bwplotka
Copy link
Member

bwplotka commented Oct 23, 2024

To progress here:

A) We could start by adding a single different record type for RefHistogram to replace float and normal histogram, but also supports custom buckets -- all with updated histogram scheme that:

  • supports NHCB
  • ONLY encodes certain things based on schema and if it's float or native (another bit for that)?
  • I would even try by using capnproto to see how it looks like (only for new records) to enable unknown fields in future.

Funny enough, even without tackling versioning a lot here, we could encode ONLY NHCB in this new record type, and then switch to it if successful for other histograms in a few later releases.

B) We could continue separate bucketing record. It's inconsistent and might cause pain in future, but it does not break compatibility, I think. I wonder only as compared to @carrieedwards approach here is there a room to encapsulate reading/writing a single package or abstraction, adding new records even is spreaded around the codebase with pools everywhere. Technically head/wal code does not need to be aware about RefHistogramCustomValues record, it could be all passed by and updated onto corresponding RefHistogramSample literally per segment. Wonder if this new pattern of splitting information to multiple records only to avoid compatibility problems is worth it, where we might want to tackle compatibility/versioning right away and keep schema clean.

I think I sill lean on A), but personally will have more time helping on this after KubeCon NA

@krajorama
Copy link
Member Author

Working through this myself to see if I understand:

RefHistogramSample and RefFloatHistogramSample (when did we start dropping Sample from the name? lol) is used in other places as well like head appender Commit(), remote write queue manager AppendHistograms(). Etc. So it would make sense to keep it - the data it contains is compatible with NHCB and exponential histograms, it's "just" the byte encoding that's the question.

As a baseline I think you need to absolutely support the current HistogramSamples encoding, since you wouldn't be able to upgrade with an existing WAL.

So an idea "A"+"B" could be to keep the current RefHistogramSample , but during encoding/decoding do something different based on the schema. So encode exponential histograms with record type HistogramSamples , but custom buckets with CustomBucketHistogramSamples and same for the float version. Then on decoding, we'd decode both HistogramSamples and CustomBucketHistogramSamples into RefHistogramSample.

The super annoying thing about the WAL/WBL is that it orders records based on type , not on append order, so it has series first, then floats, then histograms and float histograms etc. This would mean going back end forth between the types. No idea if that's bad or not? But your idea "A" would take care of that.

Another consequence of the type based ordering in WAL/WBL is that it's kind of hard to implement #15177 and keep the appended order over Commit. So I'm actually tempted to say that we should fold even RefSample into the new format eventually. Not at first, but eventually - so design with that in mind?

cc @bwplotka @carrieedwards also @codesome

@krajorama
Copy link
Member Author

krajorama commented Oct 24, 2024

I have another idea based on "B" to optimize away storing custom values multiple times: have a marker that says: "use the custom value from the previous sample for the same series reference". And use the lastHistogramValue/lastFloatHistogramValue in memSeries to keep track on WAL reload.

@bwplotka
Copy link
Member

bwplotka commented Oct 25, 2024

Agree with the ideas in #14730 (comment)

As per later comment on optimizations based on what changed I think it's def powerful but it feels we have to first DRY the record here, by switching into single, more complex one (and version by record types?), then we can iterate with "unchanged bucketing" logic 👍

It feels like a lot of work, but if we team up perhaps it's eatable this year

@krajorama
Copy link
Member Author

As per later comment on optimizations based on what changed I think it's def powerful but it feels we have to first DRY the record here, by switching into single, more complex one (and version by record types?), then we can iterate with "unchanged bucketing" logic 👍

@carrieedwards and @krajorama will check if the "single" part of this is easily feasible as the interface of WAL currently handles integer and float histograms separately all over the place. If not, we'll have to go for two new record types and make more substantial changes together with other things. I think we're getting to the point where we'll need a design doc->proposal on how we want to reform the code around this.

@krajorama
Copy link
Member Author

We've paired up and came to the conclusion that if we're not going to optimize and/or fix commit ordering then it makes sense to simply reuse the existing record types (HistogramSamples=7 and FloatHistogramSamples=8).

One nice thing we could do is to release a version of Prometheus that can read and ignore NHCB encoded in WAL, but ignores it and make the next version use it for write as well. So that users have something to fall back on.

@krajorama
Copy link
Member Author

We've paired up and came to the conclusion that if we're not going to optimize and/or fix #15177 then it makes sense to simply reuse the existing record types (HistogramSamples=7 and FloatHistogramSamples=8).

This turned out to be wrong as you would be able to load the WAL anymore on a rollback. Opting to add new record types and when writing histogram samples to WAL we write two records: one containing the non NHCB samples and one with the NHCB samples. The old code will ignore the second one, enabling rollback without NHCB.

@krajorama
Copy link
Member Author

Design doc: https://docs.google.com/document/d/1oYmvK7rrRFNrkM4Hrze8OsaK4z0GGj4pXl6VT6S_ef0/edit?tab=t.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants