nhcb: store custom buckets in WAL, WBL #14730

krajorama · 2024-08-26T06:39:11Z

Proposal

Related to prometheus/proposals#31

WAL, WBL records need to be able to encode the native histogram custom values.

Alternative 1:

Encode custom values into every histogram record.

Alternative 2:

This is the preferred method for implementation. #14730 (comment)

Introduce new record type that has a series reference and the custom values. Only record once every (segment?).

bwplotka · 2024-08-29T09:41:51Z

Can't we use native histogram chunk format as a first step? I guess that's the alternative 1, right?

krajorama · 2024-08-29T16:43:29Z

Can't we use native histogram chunk format as a first step? I guess that's the alternative 1, right?

Yes. I don't know if it's a good idea, haven't thought it through. I'm worried that storing the custom values for every histogram individually is a huge waste of space. Similar to how we don't store the labels in every sample, but once per series:

prometheus/tsdb/record/record.go

Line 147 in 8c7bf39

Labels labels.Labels

On the other hand adding a new RefCustomValues record type like

type RefCustomValues struct {
	Ref chunks.HeadSeriesRef
	V  []float64
}

is probably more complicated?

krajorama · 2024-09-17T09:45:39Z

@bwplotka We've discussed this question in the histograms sync meeting. It was pointed out that we already keep the bucket definitions separate from the values, since the bucket definition is a label le in the classic case and labels have their own record, separate from the value. I'm convinced now that we should have a separate record for the custom values. Updating the issue.

bwplotka · 2024-09-18T18:14:53Z

That's separate discussion than chunk format, or is it similar decision?

krajorama · 2024-09-20T09:09:30Z

That's separate discussion than chunk format, or is it similar decision?

Separate. Reusing the same formatting as in the chunks should be fine. We can always add new kind of record if we need to.

bwplotka · 2024-10-22T13:32:23Z

@bwplotka We've discussed this question in the histograms sync meeting. It was pointed out that we already keep the bucket definitions separate from the values, since the bucket definition is a label le in the classic case and labels have their own record, separate from the value. I'm convinced now that we should have a separate record for the custom values. Updating the issue.

Hm... I would challenge this. It's true we move bucketing from label + sample to complex sample. But:

We do the same for exponential native histograms, yet we didn't decide to have a separate histogram sample record and separate histogram spans/buckets/schema record.
So far looking on the @carrieedwards first attempt adding another record adds a lot of complexity to code and format vs existing histogram types referenced by refs literally have custom buckets.
AFAIK, there is currently no logic for NOT adding recording if it does not change or something, so I don't see how separation is cheaper here no both storage and encoding/decoding. I would argue it's much more expensive actually to process separate buckets.

To sum up, I would not try to prematurely optimize with a new pattern, especially as it's not easy to add/maintain - but instead do same as native histograms are doing for now, especially as NH literally have CustomValues slice in it. Then if profile and benchmarks suggest, we can try to be more smart and split/only add new record when it's needed or something.

Started some attempt on top of @carrieedwards code here: main...bwplotka/nhcb-wal-wbl

The main "counter argument" is backward compatibility. I don't see any versioning for WAL records (?), so this will require a new Histogram and float histogram record (with versioning added ideally for future use). This however can and should be FULLY abstracted by inside Re(Float)HistogramSample so little complexity for appenders and head.

bwplotka · 2024-10-23T08:36:39Z

Related #15200

bwplotka · 2024-10-23T08:57:38Z

To progress here:

A) We could start by adding a single different record type for RefHistogram to replace float and normal histogram, but also supports custom buckets -- all with updated histogram scheme that:

supports NHCB
ONLY encodes certain things based on schema and if it's float or native (another bit for that)?
I would even try by using capnproto to see how it looks like (only for new records) to enable unknown fields in future.

Funny enough, even without tackling versioning a lot here, we could encode ONLY NHCB in this new record type, and then switch to it if successful for other histograms in a few later releases.

B) We could continue separate bucketing record. It's inconsistent and might cause pain in future, but it does not break compatibility, I think. I wonder only as compared to @carrieedwards approach here is there a room to encapsulate reading/writing a single package or abstraction, adding new records even is spreaded around the codebase with pools everywhere. Technically head/wal code does not need to be aware about RefHistogramCustomValues record, it could be all passed by and updated onto corresponding RefHistogramSample literally per segment. Wonder if this new pattern of splitting information to multiple records only to avoid compatibility problems is worth it, where we might want to tackle compatibility/versioning right away and keep schema clean.

I think I sill lean on A), but personally will have more time helping on this after KubeCon NA

krajorama · 2024-10-24T12:04:00Z

Working through this myself to see if I understand:

RefHistogramSample and RefFloatHistogramSample (when did we start dropping Sample from the name? lol) is used in other places as well like head appender Commit(), remote write queue manager AppendHistograms(). Etc. So it would make sense to keep it - the data it contains is compatible with NHCB and exponential histograms, it's "just" the byte encoding that's the question.

As a baseline I think you need to absolutely support the current HistogramSamples encoding, since you wouldn't be able to upgrade with an existing WAL.

So an idea "A"+"B" could be to keep the current RefHistogramSample , but during encoding/decoding do something different based on the schema. So encode exponential histograms with record type HistogramSamples , but custom buckets with CustomBucketHistogramSamples and same for the float version. Then on decoding, we'd decode both HistogramSamples and CustomBucketHistogramSamples into RefHistogramSample.

The super annoying thing about the WAL/WBL is that it orders records based on type , not on append order, so it has series first, then floats, then histograms and float histograms etc. This would mean going back end forth between the types. No idea if that's bad or not? But your idea "A" would take care of that.

Another consequence of the type based ordering in WAL/WBL is that it's kind of hard to implement #15177 and keep the appended order over Commit. So I'm actually tempted to say that we should fold even RefSample into the new format eventually. Not at first, but eventually - so design with that in mind?

cc @bwplotka @carrieedwards also @codesome

krajorama · 2024-10-24T12:07:28Z

I have another idea based on "B" to optimize away storing custom values multiple times: have a marker that says: "use the custom value from the previous sample for the same series reference". And use the lastHistogramValue/lastFloatHistogramValue in memSeries to keep track on WAL reload.

bwplotka · 2024-10-25T10:15:54Z

Agree with the ideas in #14730 (comment)

As per later comment on optimizations based on what changed I think it's def powerful but it feels we have to first DRY the record here, by switching into single, more complex one (and version by record types?), then we can iterate with "unchanged bucketing" logic 👍

It feels like a lot of work, but if we team up perhaps it's eatable this year

krajorama · 2024-10-25T10:42:15Z

As per later comment on optimizations based on what changed I think it's def powerful but it feels we have to first DRY the record here, by switching into single, more complex one (and version by record types?), then we can iterate with "unchanged bucketing" logic 👍

@carrieedwards and @krajorama will check if the "single" part of this is easily feasible as the interface of WAL currently handles integer and float histograms separately all over the place. If not, we'll have to go for two new record types and make more substantial changes together with other things. I think we're getting to the point where we'll need a design doc->proposal on how we want to reform the code around this.

krajorama · 2024-10-25T17:41:49Z

We've paired up and came to the conclusion that if we're not going to optimize and/or fix commit ordering then it makes sense to simply reuse the existing record types (HistogramSamples=7 and FloatHistogramSamples=8).

One nice thing we could do is to release a version of Prometheus that can read and ignore NHCB encoded in WAL, but ignores it and make the next version use it for write as well. So that users have something to fall back on.

krajorama · 2024-11-06T08:25:07Z

We've paired up and came to the conclusion that if we're not going to optimize and/or fix #15177 then it makes sense to simply reuse the existing record types (HistogramSamples=7 and FloatHistogramSamples=8).

This turned out to be wrong as you would be able to load the WAL anymore on a rollback. Opting to add new record types and when writing histogram samples to WAL we write two records: one containing the non NHCB samples and one with the NHCB samples. The old code will ignore the second one, enabling rollback without NHCB.

krajorama · 2024-11-25T09:31:39Z

Design doc: https://docs.google.com/document/d/1oYmvK7rrRFNrkM4Hrze8OsaK4z0GGj4pXl6VT6S_ef0/edit?tab=t.0

krajorama mentioned this issue Aug 26, 2024

[meta] Native Histograms Custom Buckets #13485

Open

12 tasks

github-staff deleted a comment Aug 26, 2024

krajorama mentioned this issue Oct 1, 2024

nhcb: send native histograms with custom buckets through remote write 2.0 #15021

Open

This was referenced Oct 9, 2024

feat: NHCB: convert classic histograms to nhcb in scrape MVP #14978

Merged

docs: Update chunk layot for NHCB #15135

Merged

krajorama assigned carrieedwards Oct 14, 2024

bwplotka mentioned this issue Oct 23, 2024

WAL/WBL: Make iterating on format schema easier; consider versioning & forward compatibility #15200

Open

bwplotka added the component/wal label Nov 25, 2024

carrieedwards mentioned this issue Nov 26, 2024

feat(nhcb): support custom buckets in native histograms in the WAL/WBL #15467

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nhcb: store custom buckets in WAL, WBL #14730

nhcb: store custom buckets in WAL, WBL #14730

krajorama commented Aug 26, 2024 •

edited

Loading

bwplotka commented Aug 29, 2024

krajorama commented Aug 29, 2024

krajorama commented Sep 17, 2024

bwplotka commented Sep 18, 2024

krajorama commented Sep 20, 2024

bwplotka commented Oct 22, 2024

bwplotka commented Oct 23, 2024

bwplotka commented Oct 23, 2024 •

edited

Loading

krajorama commented Oct 24, 2024

krajorama commented Oct 24, 2024 •

edited

Loading

bwplotka commented Oct 25, 2024 •

edited

Loading

krajorama commented Oct 25, 2024

krajorama commented Oct 25, 2024

krajorama commented Nov 6, 2024

krajorama commented Nov 25, 2024

nhcb: store custom buckets in WAL, WBL #14730

nhcb: store custom buckets in WAL, WBL #14730

Comments

krajorama commented Aug 26, 2024 • edited Loading

Proposal

Alternative 1:

Alternative 2:

bwplotka commented Aug 29, 2024

krajorama commented Aug 29, 2024

krajorama commented Sep 17, 2024

bwplotka commented Sep 18, 2024

krajorama commented Sep 20, 2024

bwplotka commented Oct 22, 2024

bwplotka commented Oct 23, 2024

bwplotka commented Oct 23, 2024 • edited Loading

krajorama commented Oct 24, 2024

krajorama commented Oct 24, 2024 • edited Loading

bwplotka commented Oct 25, 2024 • edited Loading

krajorama commented Oct 25, 2024

krajorama commented Oct 25, 2024

krajorama commented Nov 6, 2024

krajorama commented Nov 25, 2024

krajorama commented Aug 26, 2024 •

edited

Loading

bwplotka commented Oct 23, 2024 •

edited

Loading

krajorama commented Oct 24, 2024 •

edited

Loading

bwplotka commented Oct 25, 2024 •

edited

Loading