Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow copy for scalar and nested sequences when converting data to numpy arrays #95

Merged
merged 4 commits into from
Aug 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion h5grove/content.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ def data_stats(
return get_array_stats(data)

def _get_finite_data(self, selection: Selection) -> np.ndarray:
data = np.array(self.data(selection), copy=False) # So it works with scalars
data = np.asarray(self.data(selection)) # So it works with scalars

if not np.issubdtype(data.dtype, np.floating):
return data
Expand Down
2 changes: 1 addition & 1 deletion h5grove/encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ def encode(content: Any, encoding: Optional[str] = "json") -> Response:
headers={"Content-Type": "application/json"},
)

content_array = np.array(content, copy=False)
content_array = np.asarray(content)

if encoding == "bin":
return Response(
Expand Down
23 changes: 22 additions & 1 deletion test/base_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,27 @@ def test_data_on_array_with_format(self, server, format_arg):

assert np.array_equal(retrieved_data, data)

# TODO: What should we do for csv, tiff
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t20100 ☝️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 What does it currently returns? It doesn't sounds useful anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server encounters an error:

  • csv:
  File ".../h5grove/h5grove/fastapi_utils.py", line 113, in get_data
    h5grove_response = encode(data, format)
  File ".../h5grove/h5grove/encoders.py", line 131, in encode
    csv_encode(content_array),
  File ".../h5grove/h5grove/encoders.py", line 58, in csv_encode
    np.savetxt(buffer, data, delimiter=",")
  File ".../h5grove/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 1573, in savetxt
    raise ValueError(
ValueError: Expected 1D or 2D array, got 0D array instead
  • tiff:
  File ".../h5grove/h5grove/encoders.py", line 149, in encode
    tiff_encode(content_array),
  File ".../h5grove/h5grove/encoders.py", line 78, in tiff_encode
    tifffile.imwrite(buffer, data, photometric="minisblack")
  File ".../h5grove/lib/python3.10/site-packages/tifffile/tifffile.py", line 966, in imwrite
    result = tif.write(data, shape, dtype, **kwargs)
  File ".../h5grove/lib/python3.10/site-packages/tifffile/tifffile.py", line 2020, in write
    raise RuntimeError(
RuntimeError: length of storedshape 4 not in (5, 6)

In both cases, the server returns an HTTP Error 500 to the user. I wondered if we could not make it more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered if we could not make it more explicit.

+1

What would be the best error code? 415 Unsupported Media Type ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 400 - Bad Request, I think. Those format values just aren't supported by the /data/ endpoint for scalar datasets. Clients are expected to know about this (ideally from h5grove's documentation) and not try to make the request.

Copy link
Member Author

@loichuder loichuder Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already stuff put in place to raise 422 Unprocessable Content for some encodings if the data is not numeric:

if not is_numeric_data(content_array):
and
except QueryArgumentError as e:

We could reuse it to raise a 422 error in case of unsupported encoding for scalars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@pytest.mark.parametrize("format_arg", ("json", "bin", "npy"))
def test_data_on_scalar_with_format(self, server, format_arg):
"""Test /data/ endpoint on scalar dataset"""
# Test condition
tested_h5entity_path = "/entry/scalar"
data = 5

filename = "test.h5"
with h5py.File(server.served_directory / filename, mode="w") as h5file:
dset = h5file.create_dataset(tested_h5entity_path, data=data)
dtype = dset.dtype
shape = dset.shape

response = server.get(
f"/data/?{urlencode({'file': filename, 'path': tested_h5entity_path, 'format': format_arg})}"
)
retrieved_data = decode_array_response(response, format_arg, dtype.str, shape)

assert np.array_equal(retrieved_data, data)

@pytest.mark.parametrize("format_arg", ("npy", "bin"))
def test_data_on_array_with_dtype_safe(
self,
Expand Down Expand Up @@ -114,7 +135,7 @@ def test_data_on_slice_with_format_and_flatten(self, server, format_arg):
response = server.get(
f"/data/?{urlencode({'file': filename, 'path': tested_h5entity_path, 'selection': '100,0', 'format': format_arg, 'flatten': True})}"
)
retrieved_data = np.array(decode_response(response, format_arg))
retrieved_data = np.asarray(decode_response(response, format_arg))

assert retrieved_data - data[100, 0] < 1e-8

Expand Down
2 changes: 1 addition & 1 deletion test/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def decode_array_response(
assert content_type == "application/octet-stream"
return np.frombuffer(response.content, dtype=dtype).reshape(shape)

return np.array(decode_response(response, format), copy=False)
return np.asarray(decode_response(response, format))


def assert_error_response(response: Response, error_code: int):
Expand Down
Loading