-
Notifications
You must be signed in to change notification settings - Fork 30
On dataset upload, attempt exploring also as zarr3 #9015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for enabling this. I think I tried this already locally once or twice and each time it failed 🙈.
During the code review I found that you introduced ZARR3_MULTILAYER but this is missing in 2 locations I think. Please have a look
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
📝 WalkthroughWalkthroughAdds a changelog entry and implements ZARR3 upload detection/exploration: new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Potential review focal points:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
797-806: Good refactoring! Parameter rename improves clarity.Renaming
headerWkwPathstoheaderFilePathsbetter reflects that these functions now work with multiple format types (WKW, ZARR, ZARR3), not just WKW. This improves code readability and maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
unreleased_changes/9015.md(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1)
FILENAME_ZARR_JSON(168-168)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-14)listDirectories(92-93)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
17-17: LGTM! Import correctly added for ZARR3 support.The import of
FILENAME_ZARR_JSONis necessary for detecting zarr3 format datasets and is correctly sourced from the Zarr3ArrayHeader object.
485-492: Excellent! Pattern matching now handles all ZARR3 variants.The pattern matching correctly routes both
ZARR3andZARR3_MULTILAYERto their respective exploration paths, addressing the previous review concern about missing ZARR3_MULTILAYER handling.
513-514: LGTM! ZARR3 exploration implemented consistently.The ZARR3 exploration logic correctly mirrors the ZARR implementation, using the appropriate
FILENAME_ZARR_JSONheader file for zarr3 format detection and directory structure adjustment.
523-527: Great! Multi-layer ZARR3 handling complete.The
tryExploringMultipleLayersfunction now correctly handlesZARR3_MULTILAYERby calling the appropriategetZarr3LayerDirectoriesfunction, addressing the previous review concern.
687-690: LGTM! ZARR3 format detection properly integrated.The format detection logic correctly identifies both single-layer and multi-layer zarr3 datasets using appropriate depth checks that mirror the ZARR detection pattern.
716-717: LGTM! ZARR3 detection function correctly implemented.The
looksLikeZarr3Arrayfunction appropriately searches for thezarr.jsonfile that characterizes the zarr3 format, correctly mirroring the pattern used for zarr2 detection.
771-775: LGTM! Layer directory detection for ZARR3 correctly implemented.The
getZarr3LayerDirectoriesfunction appropriately identifies zarr3 layer directories by filtering with thelooksLikeZarr3Arraypredicate, maintaining consistency with the zarr2 implementation pattern.
916-918: LGTM! Enum correctly extended for ZARR3 support.The
UploadedDataSourceTypeenum now includes bothZARR3andZARR3_MULTILAYERvalues, providing the foundation for complete zarr3 dataset handling throughout the upload service.unreleased_changes/9015.md (1)
1-2: LGTM! Changelog entry accurately documents the fix.The changelog entry clearly describes the fix for uploading zarr3 datasets without prior webknossos metadata, which aligns perfectly with the PR objectives and the code changes implemented.
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat, thanks for applying the feedback
Sadly during testing I got
[Server Time 2025-10-23T15:16:23.156Z] Invalid dataset uploaded: Not imported yet.
I got this while uploading a wkw dataset without any datasource-properties.json.
The same goes for zarr. Both datasets had only a single volume layer.
The server logs only state
backend: 2025-10-27 10:30:23,456 [warn] com.scalableminds.webknossos.datastore.controllers.DataSourceController - Answering 400 at /data/datasets/finishUpload?token=4yIOG7KWxNMbAyugf60s9A: {"messages":[{"error":"Failed to finalize upload for dataset 68ff3baf7c0100d3e5192056."},{"chain":"[Server Time 2025-10-27T09:30:23.455Z] Invalid dataset uploaded: Not imported yet."}]}
So no real additional information here :/
|
Ah interesting, your zarr test dataset had another edge case that wasn’t accounted for. There is a heuristic to insert a layer directory into the path hierarchy if the uploaded content looks like it’s already directly a layer dir. There was a false positive here, because WKW is not supported without datasource-properties.json, this is not added with this PR. But this is also not something that we’ve seen a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
797-804: Use Path API for depth instead of counting '/'.
path.toString.count(_ == '/')is OS-dependent. Preferpath.getNameCount - 1on relative paths.- private def pathExistsWithDepth(pathDepth: Int, paths: Seq[Path]) = - paths.exists(getPathDepth(_) == pathDepth) + private def pathExistsWithDepth(pathDepth: Int, paths: Seq[Path]) = + paths.exists(getPathDepth(_) == pathDepth) - private def getPathDepth(path: Path) = - path.toString.count(_ == '/') + private def getPathDepth(path: Path): Int = { + // listingRelative is a relative path; depth == number of parent segments + math.max(0, path.getNameCount - 1) + }Also applies to: 805-810
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (2)
Zarr3ArrayHeader(24-104)Zarr3ArrayHeader(166-325)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
runIf(169-178)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
PathUtils(14-14)listDirectories(92-93)listFilesRecursive(104-108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1)
168-168: Constant asvalis fine.Makes intent explicit; no behavioral risk.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
485-488: Zarr3 single-layer integration looks good.Correctly routes ZARR3 through layer/mag fix-up then exploration.
Also applies to: 514-517
491-494: Multilayer Zarr3 path added correctly.Coverage for ZARR3_MULTILAYER in both dispatcher and helper is complete.
Also applies to: 524-526
688-691: Zarr3 detection logic is consistent with existing patterns.Presence-based checks for
zarr.jsonat depths 2/3 match Zarr v2 approach.Also applies to: 716-718
781-795: Depth increase to 3: add tests to guard heuristics and perf.Good for deeper Zarr3 layouts; please add tests with representative structures to ensure mag/layer relocation still behaves as expected and doesn’t regress Zarr v2.
Do you want me to scaffold sample directory trees and an fs-backed test?
918-921: The review comment concern about ordinal/id shifts is invalid—remove or clarify.The
UploadedDataSourceTypeenum is never persisted numerically. It is used solely for local control flow during upload processing (lines 483–494), determined fresh each time from file structure viaguessTypeOfUploadedDataSource(), and never serialized to a database or API. Standard Scala Enumeration serializes by name by default, not by ordinal. PlacingZARR3andZARR3_MULTILAYERbefore existing values poses no breaking risk.
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and 2nd time testing worked. Thanks a lot 🎉 🟢
Steps to test:
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)