-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review performance issues when writing shards #3
Comments
Note too the build time approximately triples with #2 merged (https://github.com/glencoesoftware/zarr2zarr/actions/runs/10542609120 vs https://github.com/glencoesoftware/zarr2zarr/actions/runs/10562307818) due to additional shard writing tests. Using artificial data similarly constructed to what was used in #2 (comment), but with fewer planes so we can test in less than a day:
and then converting to v3 with defaults vs the worst case shard:
can definitely confirm that's not good. Taking a few intermediate stack traces, I see a lot of:
which suggests that a lot of time is being spent reading the partially-written shards, so that's a place to continue investigating. |
See glencoesoftware#3. This dramatically reduces conversion time when sharding is used.
With #6 merged, should this be closed or are there additional investigations we want to make? |
I don't think there is anything else to investigate here. |
See #2 (comment)
Enabling sharding was found to increase the Zarr v2->v3 conversion time by a factor up to ten-fold. For the same dataset, the conversion time depends on the specified shard size and increases with the shard size/number of chunks per shard.
While sharding was always expected to reduce conversion notably due to the constraints to write chunks in a specific order as well as the overhead of writing the sharding heard, the timings reported above feel unreasonable and probably require some investigation.
The text was updated successfully, but these errors were encountered: