Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more metadata for HDF5 types #85

Merged
merged 3 commits into from
Jan 29, 2024
Merged

Provide more metadata for HDF5 types #85

merged 3 commits into from
Jan 29, 2024

Conversation

axelboc
Copy link
Contributor

@axelboc axelboc commented Jan 24, 2024

The new snapshot test of H5GroveApi in H5Web reveals that a lot of advanced HDF5 types are incorrectly parsed or simply marked as unknown.

This is due to the fact that h5grove's /meta/ endpoint currently provides only a very basic dtype string (or dict of dtype strings for compound datasets), which is not sufficient and sometimes inaccurate (e.g. |O for variable-length ASCII strings instead of |S).

Following h5py's low-level API and taking inspiration from h5wasm, I modify the dataset/attribute metadata response to include a type dictionary with all the interesting information about the dataset/attribute's HDF5 type.

Unfortunately, the type property was already used for the entity kind ("dataset", "group", etc.). I've renamed the existing property to kind to match what we do in H5Web. I've also moved the previous dtype property inside the new type dictionary (I think it's still worth keeping, as some applications may prefer to avoid parsing the full type metadata). Of course, these are both breaking changes for h5grove, which is not ideal. I'm open to suggestions if that's a no-go.

@@ -123,7 +123,74 @@ def parse_slice_member(slice_member: str) -> Union[slice, int]:


def sorted_dict(*args: Tuple[str, Any]):
return dict(sorted(args))
return dict(sorted(args, key=lambda entry: entry[0]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting on the first item of each tuple only (i.e. the future dictionary keys) to avoid errors with nested dictionaries.

@axelboc axelboc requested review from loichuder and t20100 January 24, 2024 13:54
Copy link
Member

@loichuder loichuder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all right to me !

Copy link
Member

@t20100 t20100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

No issue for me to break the REST API, since jupyterlab-h5web pinpoints h5grove version it's using.
Just need to make a major release next time.

@axelboc axelboc marked this pull request as ready for review January 25, 2024 10:32
@axelboc
Copy link
Contributor Author

axelboc commented Jan 25, 2024

I've added a test that covers all the advanced types (vlen, array, ref, enum, etc.) and I've updated the API schema.

@axelboc axelboc requested a review from loichuder January 25, 2024 10:33
setup.cfg Outdated Show resolved Hide resolved
tasks.py Outdated Show resolved Hide resolved
@axelboc axelboc force-pushed the dtype branch 3 times, most recently from 47c65de to 69ac664 Compare January 29, 2024 09:55
@axelboc axelboc merged commit eb734be into main Jan 29, 2024
1 check passed
@axelboc axelboc deleted the dtype branch January 29, 2024 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants