Skip to content
This repository has been archived by the owner on Mar 6, 2023. It is now read-only.

Introducing Versioned HDF5 | Quansight Labs #305

Open
utterances-bot opened this issue Feb 12, 2022 · 2 comments
Open

Introducing Versioned HDF5 | Quansight Labs #305

utterances-bot opened this issue Feb 12, 2022 · 2 comments
Labels
utterances Label that needs to be named "utterances" for the Utterances commenting system

Comments

@utterances-bot
Copy link

Introducing Versioned HDF5 | Quansight Labs

https://labs.quansight.org/blog/2020/08/introducing-versioned-hdf5/

Copy link

Hi! This looks really interesting. Does this library just provide syntactic sugar, or does it actually do something in the background for efficiency/other reasons? For instance if mydataset is the same in both v1 and v2, are there two copies actually present on disk? Or is there just one copy, with two pointers going to the same data? (In my head I'm imagining something along the line of how git works) Thanks!

@asmeurer
Copy link
Member

@NickCrews it does reuse data, using a design that is very similar to git's. This post goes over the details https://labs.quansight.org/blog/2020/09/design-of-the-versioned-hdf5-library/. Basically if two versions of the same dataset have the exact same data in a given HDF5 chunk, that chunk will only be stored in the file once.

@trallard trallard added the utterances Label that needs to be named "utterances" for the Utterances commenting system label May 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
utterances Label that needs to be named "utterances" for the Utterances commenting system
Projects
None yet
Development

No branches or pull requests

4 participants