feat(v2): metadata label query #3749

kolesnikovae · 2024-12-09T05:08:50Z

The PR introduces metadata labels and a query interface. Metadata labels are attached by the segment writer and are utilized during the read path to filter out blocks irrelevant to the query without needing to access the blocks' TSDB index (acting as a form of skip index).

Currently, the primary use case for these labels is profile-type queries. Instead of fetching all entries in the query frontend for post-filtering, labels such as profile_type and service_name are queried directly through the new interface. This approach avoids heavy processing and only retrieves a minimal set of data.

In the future, we may consider implementing an in-memory inverted index at the metadata shard level to accelerate metadata queries and optimize memory usage.

aleks-p · 2024-12-12T21:19:22Z

I took a quick look and I think I am lacking some context.

The service name is already assigned to metastorev1.Dataset, so we filter datasets based on this today. Same with profile types, we store them as a compact array but we are introducing a larger one containing combinations of service names (always 1) and profile types (variable number) + repeated label pair counts for each combination.

Is the idea that this becomes a more flexible system for storing and querying other labels in the future? If yes, have we thought about how the proposed storage will scale (aka, do we need to store all combinations, can we skip service name)? If not, it feels like these queries are already answered efficiently today and perhaps we can postpone doing this.

One benefit I see is moving some of the computation from the query frontend to the metastore, as well as sending less data around, however I feel like this could be done without changing the storage.

Again, I am probably missing something so take this with a grain of salt :)

kolesnikovae · 2024-12-13T06:51:18Z

The service name is already assigned to metastorev1.Dataset, so we filter datasets based on this today. Same with profile types, we store them as a compact array

The implementation was intended for the prototype stage, not for production use. It was quick and dirty, designed just to make things work (somehow).

Is the idea that this becomes a more flexible system for storing and querying other labels in the future?

The change resolves a major performance issue that prevents use of metastore in production deployments:

Instead of fetching all entries in the query frontend for post-filtering, labels such as profile_type and service_name are queried directly through the new interface. This approach avoids heavy processing and only retrieves a minimal set of data.

This is the first PR in the series of improvements. Among the other things, I plan to:

Implement follower reads. It is less critical if we block followers' index (and raft log replication). We have everything in place but the client does not support such scenario.
Limit the metadata queries (something we discussed recently).
Implement fine-grained locking at the metadata index level.
Consider custom binary formats for metadata entries to avoid wasteful allocations.

If yes, have we thought about how the proposed storage will scale (aka, do we need to store all combinations, can we skip service name)?

It's not the primary goal of the change. Regarding the data model and the storage layer:

In the future, we may consider implementing an in-memory inverted index at the metadata shard level to accelerate metadata queries and optimize memory usage.

This will allow us to eliminate the full scan we currently have to do: we will only need to read a very limited piece of information on dimensions (like our TSDB index), instead of accessing each block metadata entry.

The current implementation does not introduce any noticeable memory or disk overhead. Looking at the snapshot size chart, I can't even tell exactly when I deployed the new version.

If not, it feels like these queries are already answered efficiently today and perhaps we can postpone doing this.

I haven't found traces of the queries (actually, there were two queries at least), but the profiles are quite self-explanatory.

The recently implemented string interning has mitigated the issue partially, but it was still very inefficient.

The Pyroscope UI needs the list of services and their profile types to build the main view. Currently, to get this information, the query-frontend service fetches all the metadata entries of the tenant within the time range. What's worse, we preserve the full metadata entry body, because we're querying for all services. Even in a moderately small deployment, we'd need to transfer and process hundreds of megabytes of protobuf data from metastore to query frontend.

Note that the issue may not arise immediately but surfaces as more data accumulate. The problem deteriorates massively when the compaction process delays.

One benefit I see is moving some of the computation from the query frontend to the metastore, as well as sending less data around, however I feel like this could be done without changing the storage.

This is the only benefit the change is aiming at. The technique is called "predicate pushdown": it's less effective and efficient without the index I mentioned above, but it solves the problem for now and for the nearest future.

Of course, it could be implemented in a different way. I decided to implement it this way due to numerous reasons. Among them:

Postponing the change will complicate the implementation due to compatibility guarantees.
The concept of labels/dimensions aligns perfectly with the inverted index model and can also be leveraged for skip indexes. This approach is natural for document-oriented stores, which is essentially what the metastore is.
The existing APIs and the data model we adhere to suggest using labels: our main client – the UI – accesses this data via the Series API, which is all about labels (ProfileTypes is still used in the Grafana Explore view, though; I hope we deprecate it one day).
It is extendable, indeed:
- One option I see is service collocation: having a separate dataset for each individual service might not always be a) efficient or b) possible. For example, currently, OTel knows nothing about services – I hope this will be addressed, but I can see how a similar issue might arise in a different context.
- Another example: storing build identifiers of mappings as service labels. This way, we can pre-fetch symbolic information for a query before accessing the data blocks to avoid synchronous symbolication.

aleks-p · 2025-01-02T21:24:39Z

I should have been more clear, my doubts are about something else.

I understand that the purpose is to discard as many blocks as possible as early as possible. I have no doubt this is more efficient than the filtering we are doing currently in the query frontend. My point was that this idea seems doable without a change to the block meta storage format and I am trying to gauge whether the added complexity in how labels are stored is worth it at this time. I expect the answer to be "yes", but I was hoping for some more reasoning around this.

aleks-p · 2025-01-02T19:31:09Z

pkg/experiment/block/metadata/metadata.go

+			if i == skip {
+				skip += int(v)*2 + 1


This wasn't obvious to me at my first pass. I think it has to do with storing the same labels multiple times but I am unsure why we do that. A comment here could help. Alternatively, an explanation here or elsewhere on why labels are stored in this particular way could also be helpful.

I'll add a comment.

This is how multiple dimensions are handled: when an entity has multiple relations, and the representation is denormalized, there is some redundancy (it should be at least 1NF – but this not a relational db therefore we have lots of freedom here).

For example, for now, each dataset is described by a collection of label sets:

service_name=A, profile_type=cpu service_name=A, profile_type=memory

If we store multiple services in the same dataset, or extend the list attributes available at the query time, we can represent it as follows:

service_name=A, profile_type=cpu service_name=A, profile_type=memory service_name=B, namespace=N, profile_type=cpu service_name=B, namespace=N, profile_type=memory service_name=C, namespace=N, profile_type=cpu service_name=C, namespace=N, profile_type=memory

Or, if we want to add arbitrary service attributes (don't have to be present in the block's tsdb index):

service_name=A, attr_1=Foo service_name=B, attr_1=Bar

For example, the approach above can be used for cost attribution or any similar break-down analysis.

The length-prefixed encoding of KV pairs is the simplest way to represent that:

len(2) | k1 | v1 | k2 | v2 | len(3) | k1 | v3 | k2 | v4 | k3 | v5

It's not meant to be ultra-efficient in terms of space because ultimately, we will be using an inverted index to store and query them (e.g., we could probably reuse the TSDB index, but wouldn't). In addition, keep in mind 2 things: 1) varint encoding in protobuf 2) page organization in BoltDB – these are the reasons why the impact on the encoding/decoding and snapshot size is barely noticeable.

Thanks for expanding on it. What are the advantages of this storage, compared to storing unique label pairs?

This:

service_name=A, profile_type=cpu service_name=A, profile_type=memory service_name=B, namespace=N, profile_type=cpu service_name=B, namespace=N, profile_type=memory service_name=C, namespace=N, profile_type=cpu service_name=C, namespace=N, profile_type=memory

Becomes:

service_name=A, service_name=B, service_name=C, profile_type=cpu, profile_type=memory, ...

The only reason I am asking is because of the current usage, where multi-dimensionality doesn't seem required and queries are simple. Correct me if I am wrong, but we are currently either looking for a single service name or use an empty query expression (for profile types). From my understanding however, we plan to use this differently later on, so lets keep it with a comment providing some context.

aleks-p · 2025-01-02T19:32:32Z

pkg/experiment/block/metadata/metadata_labels.go

+type LabelBuilder struct {
+	strings  *StringTable
+	labels   []int32
+	constant []int32


nit, should this be constants?

Can you explain why? This is the constant part of the label sets produced with the builder – see WithConstantPairs.

We are using a singular form for something representing multiple values (label pairs). It is a confusing name for me, but if you think it is fine we can keep it.

aleks-p · 2025-01-02T20:57:11Z

pkg/experiment/metastore/query_service.go

+	return resp, err
+}
+
+func (svc *MetadataQueryService) queryMetadataLabels(


not specific to this PR and we don't have to do it now, but we should add tracing to the metastore APIs

kolesnikovae · 2025-01-03T07:24:44Z

Thanks for the review, Aleks!

My point was that this idea seems doable without a change to the block meta storage format

I believe you can find answers in:

Of course, it could be implemented in a different way. I decided to implement it this way due to numerous reasons. Among them:

Postponing the change will complicate the implementation due to compatibility guarantees.

The concept of labels/dimensions aligns perfectly with the inverted index model and can also be leveraged for skip indexes. This approach is natural for document-oriented stores, which is essentially what the metastore is.

The existing APIs and the data model we adhere to suggest using labels: our main client – the UI – accesses this data via the Series API, which is all about labels (ProfileTypes is still used in the Grafana Explore view, though; I hope we deprecate it one day).

It is extendable, indeed:

One option I see is service collocation: having a separate dataset for each individual service might not always be a) efficient or b) possible. For example, currently, OTel knows nothing about services – I hope this will be addressed, but I can see how a similar issue might arise in a different context.

Another example: storing build identifiers of mappings as service labels. This way, we can pre-fetch symbolic information for a query before accessing the data blocks to avoid synchronous symbolication.

If you disagree with any of the above or don't find it justified, I'm open to your input, but I ask that you be specific:

I am trying to gauge whether the added complexity in how labels are stored is worth it at this time. I expect the answer to be "yes", but I was hoping for some more reasoning around this.

This does not help me understand your concerns. Could you please clarify your expectations?

aleks-p

This does not help me understand your concerns. Could you please clarify your expectations?

My concern was adding complexity for something we don't need right now, but might need in the future. Future-proofing can be good, especially if we have a plan to take a certain direction or if we know the cost of doing the work later will be too high. It can also be bad, if we never follow up on this and end up having code that can't be reasoned about.

From this conversation it looks like you are convinced both of those points are true, which is enough to wrap this discussion :)

aleks-p · 2025-01-03T11:42:48Z

pkg/experiment/block/metadata/metadata_labels.go

+type LabelBuilder struct {
+	strings  *StringTable
+	labels   []int32
+	constant []int32


We are using a singular form for something representing multiple values (label pairs). It is a confusing name for me, but if you think it is fine we can keep it.

aleks-p · 2025-01-03T11:43:22Z

pkg/experiment/block/metadata/metadata.go

+			if i == skip {
+				skip += int(v)*2 + 1


Thanks for expanding on it. What are the advantages of this storage, compared to storing unique label pairs?

This:

service_name=A, profile_type=cpu service_name=A, profile_type=memory service_name=B, namespace=N, profile_type=cpu service_name=B, namespace=N, profile_type=memory service_name=C, namespace=N, profile_type=cpu service_name=C, namespace=N, profile_type=memory

Becomes:

service_name=A, service_name=B, service_name=C, profile_type=cpu, profile_type=memory, ...

The only reason I am asking is because of the current usage, where multi-dimensionality doesn't seem required and queries are simple. Correct me if I am wrong, but we are currently either looking for a single service name or use an empty query expression (for profile types). From my understanding however, we plan to use this differently later on, so lets keep it with a comment providing some context.

kolesnikovae marked this pull request as ready for review December 9, 2024 12:59

kolesnikovae requested a review from a team as a code owner December 9, 2024 12:59

Base automatically changed from feat/metadata-string-interning to main December 10, 2024 10:48

kolesnikovae marked this pull request as draft December 10, 2024 12:53

kolesnikovae force-pushed the feat/metadata-label-query branch from a8dfdb7 to 43910d3 Compare December 11, 2024 04:00

kolesnikovae added 7 commits December 11, 2024 12:02

feat(v2): add metadata label query interface

2e81552

implement clients

0096e8e

improve error handling

4ec98c1

implement metadata label query service

0aa0528

implement metadata label querier

955b5a9

test and fix metadata label querier

c7e1f32

refactor metadata label compaction

5b2893c

kolesnikovae force-pushed the feat/metadata-label-query branch from 43910d3 to 5b2893c Compare December 11, 2024 04:03

introduce block/metadata package

23af6db

kolesnikovae marked this pull request as ready for review December 11, 2024 09:11

avoid sending stubs in GetTenantStats

71518d8

kolesnikovae requested a review from aleks-p December 12, 2024 12:02

aleks-p reviewed Jan 2, 2025

View reviewed changes

document metadata entry

5523f92

aleks-p approved these changes Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v2): metadata label query #3749

feat(v2): metadata label query #3749

kolesnikovae commented Dec 9, 2024 •

edited

Loading

aleks-p commented Dec 12, 2024

kolesnikovae commented Dec 13, 2024 •

edited

Loading

aleks-p commented Jan 2, 2025

aleks-p Jan 2, 2025

kolesnikovae Jan 3, 2025 •

edited

Loading

aleks-p Jan 3, 2025

aleks-p Jan 2, 2025

kolesnikovae Jan 3, 2025 •

edited

Loading

aleks-p Jan 3, 2025

aleks-p Jan 2, 2025

kolesnikovae commented Jan 3, 2025

aleks-p left a comment •

edited

Loading

aleks-p Jan 3, 2025

aleks-p Jan 3, 2025

feat(v2): metadata label query #3749

Are you sure you want to change the base?

feat(v2): metadata label query #3749

Conversation

kolesnikovae commented Dec 9, 2024 • edited Loading

aleks-p commented Dec 12, 2024

kolesnikovae commented Dec 13, 2024 • edited Loading

aleks-p commented Jan 2, 2025

aleks-p Jan 2, 2025

Choose a reason for hiding this comment

kolesnikovae Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

aleks-p Jan 3, 2025

Choose a reason for hiding this comment

aleks-p Jan 2, 2025

Choose a reason for hiding this comment

kolesnikovae Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

aleks-p Jan 3, 2025

Choose a reason for hiding this comment

aleks-p Jan 2, 2025

Choose a reason for hiding this comment

kolesnikovae commented Jan 3, 2025

aleks-p left a comment • edited Loading

Choose a reason for hiding this comment

aleks-p Jan 3, 2025

Choose a reason for hiding this comment

aleks-p Jan 3, 2025

Choose a reason for hiding this comment

kolesnikovae commented Dec 9, 2024 •

edited

Loading

kolesnikovae commented Dec 13, 2024 •

edited

Loading

kolesnikovae Jan 3, 2025 •

edited

Loading

kolesnikovae Jan 3, 2025 •

edited

Loading

aleks-p left a comment •

edited

Loading