Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestMetadata: Add an additional_handlers field for additional tool details #306

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ColdrickSotK
Copy link

@ColdrickSotK ColdrickSotK commented Aug 6, 2024

It is possible for more than one tool to be involved in the sending of a gRPC request, for example when a client-side proxy is used it is interesting to know the name and version of both the proxy and the original client tool.

To facilitate this in RequestMetadata, this PR adds a repeated additional_handlers field containing ToolDetails messages, allowing proxies and other related tools to also be recorded in the metadata sent with requests without modifying the original values.

As a concrete use-case example, we have a thin client tool which currently populates tool_details with its own name and version info, along with an optional extra tool name and version info passed to it by the caller to give some extra context about where the request came from.

We also have a client-side caching proxy which those requests are initially sent to, and currently track details of that by appending to the strings in the tool_details of the request being proxied.

I'm not particularly interested in tracing the call order of those tools using this metadata; I just want to be able to express this data in a well-defined way rather than depending on how the individual tools construct and modify the tool_details strings.

Copy link

google-cla bot commented Aug 6, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@EdSchouten EdSchouten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though Protobuf does support recursion, I think we should avoid it if we can, especially in this case where the recursion is linear.

Please make the existing tool details field repeated.

@ColdrickSotK
Copy link
Author

Even though Protobuf does support recursion, I think we should avoid it if we can, especially in this case where the recursion is linear.

Please make the existing tool details field repeated.

I had a memory of some discussion about potentially having a tree structure of tool details in the June monthly meeting, but just using a simple repeated field is fine for my use case.

Is a repeated field wire-compatible with a non-optional scalar field like tool_details, or do we need to deprecate the old field in favour of a repeated one?

@EdSchouten
Copy link
Collaborator

EdSchouten commented Aug 6, 2024

For Protobuf’s binary format it is 100% safe to upgrade a singular field to a repeated one. In fact, a repeated field that only has a single element is encoded exactly the same way as a singular field. When an old client unmarshals a new message that contains multiple elements, the last one is retained. See:

https://protobuf.dev/programming-guides/encoding/#last-one-wins

The only annoyance is Protobuf’s JSON format. At least the Go protojson package is too stubborn to permit parsing scalars as lists or vice versa. But especially for RequestMetadata this should never be an issue. The protocol only documents circumstances in which this message is encoded in binary form, base64 encoded, and then attached to a HTTP header. So especially for that message I’d be far less concerned. Even if gRPC-Web is used, this message in particular is never JSON encoded.

If we want to be super conservative, we could add a separate repeated field and at the same time deprecate the old one. This is also what we did for the multiple digest functions for remote execution in the capabilities response. But again, I don’t think that is worth the hassle.

@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch from c11dd37 to fbf86d2 Compare August 6, 2024 16:03
Copy link
Collaborator

@EdSchouten EdSchouten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please run the git hook script to regenerate the .pb.go files.

@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch from fbf86d2 to 2de9d44 Compare August 7, 2024 10:52
@ColdrickSotK ColdrickSotK changed the title RequestMetadata: Make ToolDetails a recursive structure RequestMetadata: Make tool_details repeated Aug 7, 2024
@EdSchouten
Copy link
Collaborator

EdSchouten commented Aug 7, 2024

I think it looks good to go. But let’s give it a final round of discussion during the monthly gathering.

Copy link
Contributor

@sluongng sluongng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a breaking change.

Instead of changing the existing field, could you please add a newer field and mark the old one as deprecated?

To clarify, any client/server implementations would need to be backward compatible because their counterpart could be using multiple different versions of remote apis. For example: our RBE server supports a wide range of Bazel versions: from 5.0 to the latest 8.x on HEAD. So we cannot have tool_details being a singular struct in some requests and an array in others.

@EdSchouten
Copy link
Collaborator

So we cannot have tool_details being a singular struct in some requests and an array in others.

Please see the discussion we had above, which specifically discusses the compatibility aspects of Protobuf when upgrading singular fields to repeated fields.

@sluongng
Copy link
Contributor

sluongng commented Aug 7, 2024

Right. But the generated Go code would still flip around between a struct and a slice and thus, is a breaking change for the Go module.

@EdSchouten
Copy link
Collaborator

Right. But the generated Go code would still flip around between a struct and a slice and thus, is a breaking change for the Go module.

Which is completely acceptable, because that's what go mod is supposed to cover. Existing code will continue to build. If people do a go get -u, then they can fix those compilation issues at the same time.

@bergsieker
Copy link
Collaborator

In fact, a repeated field that only has a single element is encoded exactly the same way as a singular field. When an old client unmarshals a new message that contains multiple elements, the last one is retained. See:

https://protobuf.dev/programming-guides/encoding/#last-one-wins

As I read that link, the "last one wins" behavior is only true for primitive types--strings and numbers. For nested messages, as is the case here, the subfields will actually be merged. Merging will almost certainly break whatever metrics/dashboards people have built around this data.

@bergsieker
Copy link
Collaborator

Should we specify any concept of ordering in this repeated field? For example, if we're talking about Bazel plus one proxy, I'd probably want Bazel's information to be in position 0 as the "originator" of the request. I'm guessing that the easiest implementation will just have successive layers of handlers append to the end, but it may be useful to codify that in the API itself.

Combined with the issues of merging fields for legacy clients, I'm wondering if we should keep the existing scalar field as-is, and add a "repeated ToolDetails additional_handlers" that can be populated with anything that proxied/modified the request along the way.

@ColdrickSotK
Copy link
Author

Combined with the issues of merging fields for legacy clients, I'm wondering if we should keep the existing scalar field as-is, and add a "repeated ToolDetails additional_handlers" that can be populated with anything that proxied/modified the request along the way.

It sounds like this might be the most frictionless way forward with this change, I'll update the MR.

@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch 2 times, most recently from 0b190b7 to c47c044 Compare October 3, 2024 10:55
@ColdrickSotK ColdrickSotK changed the title RequestMetadata: Make tool_details repeated RequestMetadata: Add an additional_handlers field for additional tool details Oct 3, 2024
@EdSchouten
Copy link
Collaborator

As I read that link, the "last one wins" behavior is only true for primitive types--strings and numbers. For nested messages, as is the case here, the subfields will actually be merged. Merging will almost certainly break whatever metrics/dashboards people have built around this data.

What's wrong with merging in this specific case? ToolDetails contains two simple string fields. In this case both of them will simply be overwritten.

@bergsieker
Copy link
Collaborator

As I read that link, the "last one wins" behavior is only true for primitive types--strings and numbers. For nested messages, as is the case here, the subfields will actually be merged. Merging will almost certainly break whatever metrics/dashboards people have built around this data.

What's wrong with merging in this specific case? ToolDetails contains two simple string fields. In this case both of them will simply be overwritten.

Ah, you are right. I had misread that documentation as "the string fields will be concatenated." On a second reading that's clearly not the case.

I think there's still some awkwardness with ordering--IIRC there's no guarantee of the serialization order, so servers expecting a singular field could technically end up with either entry 0 or entry 1. It would also be possible to end up with a tool_name from one tool and a tool_version from the other.

@EdSchouten
Copy link
Collaborator

EdSchouten commented Oct 3, 2024

I think there's still some awkwardness with ordering--IIRC there's no guarantee of the serialization order, so servers expecting a singular field could technically end up with either entry 0 or entry 1.

I don't think that's the case. It's literally "the last one wins". It is impossible to serialize Protobuf messages in such a way that list entries are stored in a different order in the byte stream as they were listed in the original array. So a legacy server will have tool_details.${field} be equal to a client's tool_details[n].${field}, where n is the highest value where the resulting string is non-empty.

It would also be possible to end up with a tool_name from one tool and a tool_version from the other.

Yes, but this only happens if the last entry/entries leave one of tool or tool_version unset. Because then it will use the one from the entry before, or before that one, or [...]. Not ideal, but arguably also not the end of the world. In my opinion it's a better tradeoff than needing to hold on to historical baggage.

@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch from c47c044 to 43408cc Compare October 4, 2024 12:44
@ColdrickSotK
Copy link
Author

It would also be possible to end up with a tool_name from one tool and a tool_version from the other.

Yes, but this only happens if the last entry/entries leave one of tool or tool_version unset. Because then it will use the one from the entry before, or before that one, or [...]. Not ideal, but arguably also not the end of the world. In my opinion it's a better tradeoff than needing to hold on to historical baggage.

It's not the end of the world, but it seems like the kind of bug that makes another bug harder to investigate. And this merging behaviour means that when clients using the repeated field are talking to a server using the old proto, the most useful information in the chain (ie. the original invoker) gets lost. Granted I doubt that many use-cases will ever end up in that usage pattern.

I like the idea of having a separate field for additional related tools; it keeps this as strictly an additional feature, without impacting existing workflows. It also seems to map well onto the intended usage to me, the tool_details field specifies details of the most relevant tool to the actual request, with the new field being additional supplementary metadata to track exactly where the request has been.


// The details of other tools involved in handling the request, eg. client-side proxies.
// This field MUST NOT include the details of the tool which initially invoked the gRPC
// request (ie. it MUST NOT duplicate `RequestMetadata.tool_details`). The order of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "i.e., it MUST NOT" (note punctuation of "i.e.,")

// client -> proxy1 -> proxy2 -> server
//
// `RequestMetadata.tool_details` SHOULD contain a ToolDetails message about `client`,
// and `RequestMetadata.additional_handlers` MAY contain two ToolDetails messages,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that both tool_details and additional_handlers probably both SHOULD (not MAY) contain the required information. Technically all of this is optional, in an optional header, but within that scope I don't see a good rationale for making one SHOULD and the other MAY.

Copy link
Collaborator

@bergsieker bergsieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments and then I'm good with this, pending approval from other reviewers.

It is possible for more than one tool to be involved in the sending of a
gRPC request, for example when a client-side proxy is used it is
interesting to know the name and version of both the proxy and the
original client tool.

To facilitate this in RequestMetadata, this commit adds a new repeated
`additional_handlers` field to the message which can contain details of
other tools involved in the handling of a request.
@ColdrickSotK ColdrickSotK force-pushed the sotk/request-metadata/tool-details-tree branch from 43408cc to e3db82c Compare October 17, 2024 11:18
@bergsieker
Copy link
Collaborator

I'd like to get a thumbs-up from Ed and Son before merging, if possible.

Copy link
Contributor

@sluongng sluongng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of adding some guidance about ordering to this field.
We want to avoid cases where some implementations would implement this as ASC while others implement it as DESC, and the only way to figure out which was correct is to peek into the implementation.

If it's the current state, then I suspect folks will start cloning the last tool details in this field and append to tool_name with a separator to determine ordering. I.e. something like

additional_handlers = [
  {tool_name = "proxy1"},
  {tool_name = "proxy1/proxy2"},
]

which can be quite ugly.

@bergsieker
Copy link
Collaborator

I am in favor of adding some guidance about ordering to this field.

I am OK with either having guidance, or (as is currently the case) explicitly saying the ordering is meaningless. I suspect that "did tool X handle this request" is likely useful, but "did tool Y handle this request before tools X" is much less useful, which is why I proposed the "order is meaningless" guidance.

One question: are there realistic workflows that are not linear? For example, maybe a proxy merges BatchUpdateBlobs requests before passing them on, in which case we'd have to worry about how to merge the additional_handlers. This is easy with an unordered set, and difficult-to-impossible with an ordered one.

@sluongng
Copy link
Contributor

Initially, I think that we can use this PR as a tracing mechanism and use the data to visualize how the request propagates through the system. To build such a visualization, we need order to be defined.

But as I take a step back, I think most implementations (proxies and servers) have already implemented their own distributed tracing mechanism, often compatible with a more widely adopted spec. So perhaps my original thinking was wrong and that's not how this new field should be used. I tried to re-read the comments in this PR and it was not clear to me how the server should use this information if not for tracing requests. Perhaps if you are routing requests through multiple layers of proxies, this could help you implement a check for recursive routing (due to potential miss-config)? So @ColdrickSotK, I would appreciate if you could write a bit more (either in the PR description, or the field's comment) on the motivation behind this change and why ordering does/doesn't matter for your use case.


One question: are there realistic workflows that are not linear?

Most workflows I can think of are linear.

For example, maybe a proxy merges BatchUpdateBlobs requests before passing them on, in which case we'd have to worry about how to merge the additional_handlers. This is easy with an unordered set, and difficult-to-impossible with an ordered one.

I think if the proxy were to merge multiple client requests into one big batch request, then the proxy should be tool_details and all clients should go into additional_handlers right?

If you have an L0 -> L1 proxy merging situation, then merging additional_handlers while preserving orders would be quite messy. Luckily, this problem is already solved via the tracing spec in Open Telemetry. So I would circle back and ask: In a setup like this

Clients --> L0 Proxies (merged) --> L1 Proxy (merged) --> Server

What should the L1 proxy and server do with additional_handlers field? How useful is it to know which clients and proxies have handled the request?

@ColdrickSotK
Copy link
Author

I tried to re-read the comments in this PR and it was not clear to me how the server should use this information if not for tracing requests. Perhaps if you are routing requests through multiple layers of proxies, this could help you implement a check for recursive routing (due to potential miss-config)? So @ColdrickSotK, I would appreciate if you could write a bit more (either in the PR description, or the field's comment) on the motivation behind this change and why ordering does/doesn't matter for your use case.

I'll update the PR description; for me the use-case for this is simply to surface version information about multiple tools involved in a request in a more structured manner.

As a concrete example, we have a thin client tool which currently populates tool_details with its own name and version info, along with an optional extra tool name and version info passed to it by the caller to give some extra context about where the request came from.

We also have a client-side caching proxy which those requests are initially sent to, and currently track details of that by appending to the strings in the tool_details of the request being proxied.

We're not particularly interested in tracing the call order of those tools using this metadata; I just want to be able to express this data in a well-defined way rather than depending on how the individual tools construct and modify the tool_details strings.

I think if the proxy were to merge multiple client requests into one big batch request, then the proxy should be tool_details and all clients should go into additional_handlers right?

That would make sense for this scenario to me, yeah.

In a setup like this

Clients --> L0 Proxies (merged) --> L1 Proxy (merged) --> Server

What should the L1 proxy and server do with additional_handlers field? How useful is it to know which clients and proxies have handled the request?

For my use-case, with no ordering guarantees, the L1 proxy can populate additional_handlers with the set of all additional_handlers from the merged requests. This would provide sufficient (structured) information about the various tools involved in the request for my purposes.

I'm not sure how well this RequestMetadata message maps to the merging case in general though; what would you expect the metadata on a request sent from such a proxy to look like even before this PR?

Maybe its fine to have a merging proxy like that swallow the original client's metadata completely, and just populate a fresh message for it's own request?

@bergsieker
Copy link
Collaborator

I think this is ready to merge, but it's been long enough that I can't remember if there are any outstanding AIs. @ColdrickSotK are you aware of any changes that need to be made here?

@ColdrickSotK
Copy link
Author

I think this is ready to merge, but it's been long enough that I can't remember if there are any outstanding AIs. @ColdrickSotK are you aware of any changes that need to be made here?

I'm not aware of anything outstanding if we're happy to go ahead without ordering guarantees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants