Provide more metadata for HDF5 types #85

axelboc · 2024-01-24T13:50:38Z

The new snapshot test of H5GroveApi in H5Web reveals that a lot of advanced HDF5 types are incorrectly parsed or simply marked as unknown.

This is due to the fact that h5grove's /meta/ endpoint currently provides only a very basic dtype string (or dict of dtype strings for compound datasets), which is not sufficient and sometimes inaccurate (e.g. |O for variable-length ASCII strings instead of |S).

Following h5py's low-level API and taking inspiration from h5wasm, I modify the dataset/attribute metadata response to include a type dictionary with all the interesting information about the dataset/attribute's HDF5 type.

Unfortunately, the type property was already used for the entity kind ("dataset", "group", etc.). I've renamed the existing property to kind to match what we do in H5Web. I've also moved the previous dtype property inside the new type dictionary (I think it's still worth keeping, as some applications may prefer to avoid parsing the full type metadata). Of course, these are both breaking changes for h5grove, which is not ideal. I'm open to suggestions if that's a no-go.

axelboc · 2024-01-24T13:53:10Z

h5grove/utils.py

@@ -123,7 +123,74 @@ def parse_slice_member(slice_member: str) -> Union[slice, int]:


 def sorted_dict(*args: Tuple[str, Any]):
-    return dict(sorted(args))
+    return dict(sorted(args, key=lambda entry: entry[0]))


Sorting on the first item of each tuple only (i.e. the future dictionary keys) to avoid errors with nested dictionaries.

loichuder

Looks all right to me !

t20100

LGTM!

No issue for me to break the REST API, since jupyterlab-h5web pinpoints h5grove version it's using.
Just need to make a major release next time.

axelboc · 2024-01-25T10:33:21Z

I've added a test that covers all the advanced types (vlen, array, ref, enum, etc.) and I've updated the API schema.

setup.cfg

tasks.py

Provide more HDF5 type metadata

0f21787

axelboc commented Jan 24, 2024

View reviewed changes

axelboc requested review from loichuder and t20100 January 24, 2024 13:54

loichuder approved these changes Jan 24, 2024

View reviewed changes

t20100 approved these changes Jan 24, 2024

View reviewed changes

axelboc force-pushed the dtype branch from cd7fdfc to 275d06d Compare January 24, 2024 16:03

Test new type metadata

de5ceee

axelboc force-pushed the dtype branch from 275d06d to 219c34f Compare January 25, 2024 10:28

axelboc marked this pull request as ready for review January 25, 2024 10:32

axelboc requested a review from loichuder January 25, 2024 10:33

axelboc commented Jan 25, 2024

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

axelboc commented Jan 25, 2024

View reviewed changes

tasks.py Outdated Show resolved Hide resolved

axelboc force-pushed the dtype branch 3 times, most recently from 47c65de to 69ac664 Compare January 29, 2024 09:55

Update API schema

f06ba3e

axelboc force-pushed the dtype branch from 69ac664 to f06ba3e Compare January 29, 2024 09:58

axelboc merged commit eb734be into main Jan 29, 2024
1 check passed

axelboc deleted the dtype branch January 29, 2024 10:34

axelboc mentioned this pull request Jan 29, 2024

Support [email protected] and improve dtype parsing silx-kit/h5web#1557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide more metadata for HDF5 types #85

Provide more metadata for HDF5 types #85

axelboc commented Jan 24, 2024 •

edited

Loading

axelboc Jan 24, 2024

loichuder left a comment

t20100 left a comment

axelboc commented Jan 25, 2024

Provide more metadata for HDF5 types #85

Provide more metadata for HDF5 types #85

Conversation

axelboc commented Jan 24, 2024 • edited Loading

axelboc Jan 24, 2024

Choose a reason for hiding this comment

loichuder left a comment

Choose a reason for hiding this comment

t20100 left a comment

Choose a reason for hiding this comment

axelboc commented Jan 25, 2024

axelboc commented Jan 24, 2024 •

edited

Loading