Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC fails with inferred f16 numpy array #1522

Open
sauerburger opened this issue Dec 21, 2023 · 1 comment
Open

gRPC fails with inferred f16 numpy array #1522

sauerburger opened this issue Dec 21, 2023 · 1 comment

Comments

@sauerburger
Copy link
Contributor

I think I discovered a bug in the current gRPC code in mlserver. I have a model that returns float16 arrays and I tried to get predictions via gRPC. I could narrow down the issue to this example without any client-server complexity.

Reproduce error

import numpy as np
from mlserver.codecs.decorator import SignatureCodec
import mlserver.grpc.converters as converters


def a() -> np.ndarray:
    return np.array([[1.123, 4], [1, 3], [1, 2]], dtype=np.float16)


codec = SignatureCodec(a)
r = codec.encode_response(payload=a(), model_name="x")
converters.ModelInferResponseConverter.from_types(r)

The last line yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/mlserver/grpc/converters.py", line 380, in from_types
    InferOutputTensorConverter.from_types(output)
  File "/usr/local/lib/python3.12/site-packages/mlserver/grpc/converters.py", line 425, in from_types
    contents=InferTensorContentsConverter.from_types(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mlserver/grpc/converters.py", line 335, in from_types
    return pb.InferTensorContents(**contents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected bytes, float found

Root cause

I think the root cause is in the gRPC type-to-field mapping:

_FIELDS = {
    ...
    "FP16": "bytes_contents",
    "FP32": "fp32_contents",
    "FP64": "fp64_contents",
    "BYTES": "bytes_contents",
}

The code uses bytes in the dataplane for FP16 inputs. The dataplane doesn't even offer a fp16_contents field that could be used for the purpose. (Is it because protobuf doesn't support fp16 by default?)

Potential fix

I think in this case, the fp32_content should be used in the gRPC type-to-field mapping. Although, this wastes half of the bandwidth.

@sauerburger
Copy link
Contributor Author

I just discovered the following in the open inference protocol:

message ModelInferResponse
{
  // ...

  // The output tensors holding inference results.
  repeated InferOutputTensor outputs = 5;

  // The data contained in an output tensor can be represented in
  // "raw" bytes form or in the repeated type that matches the
  // tensor's data type. To use the raw representation 'raw_output_contents'
  // must be initialized with data for each tensor in the same order as
  // 'outputs'. For each tensor, the size of this content must match
  // what is expected by the tensor's shape and data type. The raw
  // data must be the flattened, one-dimensional, row-major order of
  // the tensor elements without any stride or padding between the
  // elements. Note that the FP16 and BF16 data types must be represented as
  // raw content as there is no specific data type for a 16-bit float type.
  //
  // If this field is specified then InferOutputTensor::contents must
  // not be specified for any output tensor.
  repeated bytes raw_output_contents = 6;

So, 16-bit floats should actually go to raw_output_contents. Not sure, why this didn't work in my case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant