-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sending large binary data (Or doing custom marshalling/framing) #7930
Comments
In my scenario above, the server is acting upon the entire binary data immediately anyway - so it doesn't matter for the server to load up all the data at once, but obviously if they could do similar io.Reader streaming of binary data out of the protobuf message, that would be more ideal for them - I think anyway. I think in the general case, it is not always possible to stream parts of the data, say I were sending RGB image data for example and pushing it into an AI model where the model uses all 3 layers of data as a single sample for input into the model. I really think that gRPC (for golang anway) would benefit from this - it would remove the only constraint for TRUE streaming. |
@tjad thanks for the question. Any specific reason why you are using PreparedMsg? It is a special message type that represents a pre-serialized Protobuf message. It essentially allows you to bypass gRPC's internal marshaling step if you've already serialized your message manually. I think in your case, you probably just need to implement your own codec since you don't want dependency on proto https://github.com/grpc/grpc-go/blob/master/Documentation/encoding.md#implementing-a-codec And then you can create a client stream and use |
Thank you for responding @purnesh42H Does this sound doable? I would only call stream.Send() once - as it is a single gRPC request for the method. The API does not permit sending "multiple Requests" with separate payloads and having them treated as a single request for processing. I can't serialize all the data at once into a single protobuf binary - as it would consume too much memory, resulting in an application that needs to be scaled by memory, and the memory would be depleted quickly given the size of data I'm dealing with 1MB upto 200MB or greater (we haven't started processing larger data yet). The binary data is originally streamed from another data source (i/o bound) |
This does not actually work in the current state of code - as per my indication above. Even though I serialized the protobuf already and then sent it as a PreparedMsg, I got an error telling me it was expecting protobuf message (unserialized)
You will notice that encode is just encoding, and not doing any checks currently Using a codec would get around this problem. |
If you want to do manual encoding, there is no other way but to define a custom codec. You need to implement the You can refer to the grpc's implementation of the
One thing to call out here is that codecs provide the wire format of the request data. It is not recommended to send more than few KBs of data on the wire. If you are using
After starting the client stream, you will have to chunk at the application level and custom codec will be used for each chunk sent using |
If you have already serialized your message, you don't need
The main benefit of using
|
I have to integrate with a gRPC endpoint that takes large binary data. I have no control over the endpoint - I don't define how it works.
My solution is to do my own framing - almost.
I noticed there is a PreparedMsg, I want to make use of this, so that I can do the encoding step manually. This is so that I encode the protobuf request, and modify where the binary data should be.
Basically i would
1.) encode a stubbed protobuf request (which includes a placeholder for binary input)
2.) split the encoded protobuf into 2 parts (before and after where the binary data should be)
3.) modify the size indication within the protobuf (so that it matches the size of my binary data)
4.) prepare a message with PreparedMsg.Encode() for the first part of the protobuf (everything before binary data)
5.) SendMsg with the clientstream
6.) using an io.Reader, loop, populating mem.BufferSlice at a fixed size (and yes, respect that I may not modify this slice as per internals), and SendMsg ffor each mem.BufferSlice containing a portion of the binary data
7.) prepare the final part of the protobuf in a PreparedMsg and send it
Currently the problem is that there is no type check here for mem.BufferSlice , so it returns an error, telling me the encoder is expecting a proto.Message (i.e it still tries to encode the already encoded message)
https://github.com/grpc/grpc-go/blob/master/rpc_util.go#L690
Something like this
I have tested this, when adding this check, the rest of the process seems to go normally. The server receives my data and responds correctly.
I am certain there would be a better way of doing this,, but currently there is no indication of being able to support for streaming data from an io.Reader. This seems like a big flaw in design, or at least it is a massive limitation.
I need to do this in order to not have to load up the entire data in memory, encode it as protobuff, and then have gRPC do its thing. The size of my data can be as small as 1MB and span all the way up to 200MB (or more). This is a massive constraint on scalability of my client service where it is constrained to memory.
I think ideally, there should be a way to "StreamMsg" (like SendMsg), where I can pass in an io.Reader, and the Transport layer can handle it appropriately. The io.Reader would output marshalled protobuf.
This way we would be able to effectively write our own custom data processing, and the gRPC Transport( and http2.Framing) could still be harnessed via gRPC
The text was updated successfully, but these errors were encountered: