Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow copy for scalar and nested sequences when converting data to numpy arrays #95

Merged
merged 4 commits into from
Aug 26, 2024

Conversation

loichuder
Copy link
Member

Fix #94

Also brings support for Numpy v2. See the discussion at #94 for more info:

As a rule of thumb, using np.asarray should be fine. Also, we have to remove the copy=False arg since it is visibly impossible to avoid copying when the argument is a scalar or a nested sequence (lists): https://numpy.org/devdocs/reference/generated/numpy.asarray.html#numpy.asarray

By using copy=None (the default behaviour), copy is avoided when possible (ex: when the argument is a numpy array) but allowed when there is no other choice (as I said, scalars and nested sequences)

@loichuder loichuder requested a review from t20100 August 2, 2024 09:10
@loichuder loichuder self-assigned this Aug 2, 2024
@@ -77,6 +77,27 @@ def test_data_on_array_with_format(self, server, format_arg):

assert np.array_equal(retrieved_data, data)

# TODO: What should we do for csv, tiff
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t20100 ☝️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 What does it currently returns? It doesn't sounds useful anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server encounters an error:

  • csv:
  File ".../h5grove/h5grove/fastapi_utils.py", line 113, in get_data
    h5grove_response = encode(data, format)
  File ".../h5grove/h5grove/encoders.py", line 131, in encode
    csv_encode(content_array),
  File ".../h5grove/h5grove/encoders.py", line 58, in csv_encode
    np.savetxt(buffer, data, delimiter=",")
  File ".../h5grove/lib/python3.10/site-packages/numpy/lib/_npyio_impl.py", line 1573, in savetxt
    raise ValueError(
ValueError: Expected 1D or 2D array, got 0D array instead
  • tiff:
  File ".../h5grove/h5grove/encoders.py", line 149, in encode
    tiff_encode(content_array),
  File ".../h5grove/h5grove/encoders.py", line 78, in tiff_encode
    tifffile.imwrite(buffer, data, photometric="minisblack")
  File ".../h5grove/lib/python3.10/site-packages/tifffile/tifffile.py", line 966, in imwrite
    result = tif.write(data, shape, dtype, **kwargs)
  File ".../h5grove/lib/python3.10/site-packages/tifffile/tifffile.py", line 2020, in write
    raise RuntimeError(
RuntimeError: length of storedshape 4 not in (5, 6)

In both cases, the server returns an HTTP Error 500 to the user. I wondered if we could not make it more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered if we could not make it more explicit.

+1

What would be the best error code? 415 Unsupported Media Type ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 400 - Bad Request, I think. Those format values just aren't supported by the /data/ endpoint for scalar datasets. Clients are expected to know about this (ideally from h5grove's documentation) and not try to make the request.

Copy link
Member Author

@loichuder loichuder Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already stuff put in place to raise 422 Unprocessable Content for some encodings if the data is not numeric:

if not is_numeric_data(content_array):
and
except QueryArgumentError as e:

We could reuse it to raise a 422 error in case of unsupported encoding for scalars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@loichuder loichuder merged commit af4597d into main Aug 26, 2024
1 check passed
@loichuder loichuder deleted the as-array branch August 26, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Numpy 2 breaks reading scalar datasets as binary
3 participants