Skip to content

Conversation

@fm3
Copy link
Member

@fm3 fm3 commented Oct 23, 2025

Steps to test:

  • Upload zarr3 data without datasource-properties.json
  • Should be explored and become viewable

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Considered common edge cases
  • Needs datastore update after deployment

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for enabling this. I think I tried this already locally once or twice and each time it failed 🙈.

During the code review I found that you introduced ZARR3_MULTILAYER but this is missing in 2 locations I think. Please have a look

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

📝 Walkthrough

Walkthrough

Adds a changelog entry and implements ZARR3 upload detection/exploration: new ZARR3/ZARR3_MULTILAYER types, ZARR3-aware exploration/path resolution and helpers in UploadService, and changes FILENAME_ZARR_JSON from def to val in Zarr3ArrayHeader.

Changes

Cohort / File(s) Summary
Changelog Entry
unreleased_changes/9015.md
Added changelog entry documenting fix for uploading ZARR3 datasets without prior webknossos metadata.
Upload service (detection & exploration)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Added ZARR3 and ZARR3_MULTILAYER to UploadedDataSourceType; extended detection and dispatch logic to recognize ZARR3 (including header FILENAME_ZARR_JSON), added ZARR3 layer-directory discovery and looksLikeZarr3Array checks, increased scan depth (2→3) and introduced depth/path helper utilities.
Zarr3 header constant
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala
Changed FILENAME_ZARR_JSON from a parameterless method (def) to a value (val).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Potential review focal points:

  • UploadService: branching, dispatch additions for ZARR3/ZARR3_MULTILAYER and integration points where datasource-properties are generated.
  • Directory scanning and path-depth helpers (ensure depth increase and traversal logic are safe and correct).
  • Change from def to val in Zarr3ArrayHeader (initialization timing / binary-compat considerations).

Possibly related PRs

Suggested labels

zarr

Suggested reviewers

  • MichaelBuessemeyer
  • normanrz

Poem

🐰 I hopped through folders, sniffed each file,
Found a tiny zarr and paused a while.
I nudged a header, wrote the trace,
Hopped back home with metadata in place. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "On dataset upload, attempt exploring also as zarr3" directly and clearly summarizes the main change introduced in this PR. The title accurately reflects the primary objective of adding ZARR3 format support to the dataset upload exploration flow in UploadService.scala, which aligns with the code changes that extend the exploration logic to recognize and process ZARR3 datasets. The title is concise, specific, and uses clear language that would enable a teammate to quickly understand the changeset when scanning the git history.
Linked Issues Check ✅ Passed The code changes successfully address the primary objectives from issue #9014. The UploadService.scala has been extended with ZARR3 and ZARR3_MULTILAYER support in the UploadedDataSourceType enum, detection logic now recognizes ZARR3 format, and exploration pathways have been updated to handle ZARR3 sources with new helper methods like getZarr3LayerDirectories and looksLikeZarr3Array. The supporting change to Zarr3ArrayHeader.scala (converting FILENAME_ZARR_JSON from a method to a value) enables proper ZARR3 header recognition. These changes collectively ensure that uploaded zarr3 datasets without datasource-properties.json are now detected, explored, and become viewable by the backend, resolving the original issue where the frontend allowed zarr3 uploads that the backend couldn't process.
Out of Scope Changes Check ✅ Passed All code changes in this PR are directly scoped to implementing ZARR3 support for the upload exploration flow. The modifications to UploadService.scala add ZARR3-specific detection and exploration logic, changes to Zarr3ArrayHeader.scala provide necessary infrastructure for ZARR3 header recognition, and the changelog entry documents the fix. The increase of maxDepth from 2 to 3 in addLayerAndMagDirIfMissing, while touching tangential logic, is a necessary adjustment to support deeper directory structure scanning for ZARR3 layer detection. The PR author explicitly notes that WKW format remains unsupported and is not added, demonstrating awareness of scope boundaries.
Description Check ✅ Passed The pull request description is directly related to the changeset and provides relevant context about the changes. It explains the testing steps (uploading zarr3 data without datasource-properties.json to verify it becomes viewable), references the fixed issue (#9014), and documents that a changelog entry was added and common edge cases were considered. This description adequately conveys the intent and scope of the code changes implemented in the PR.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch upload-explore-zarr3

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3674943 and c935105.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (2)
  • Zarr3ArrayHeader (24-104)
  • Zarr3ArrayHeader (166-325)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • runIf (169-178)
  • toFox (14-14)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
  • PathUtils (14-14)
  • listDirectories (92-93)
  • listFilesRecursive (104-108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (11)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (11)

17-17: LGTM - Correct import for ZARR3 support.

The import of FILENAME_ZARR_JSON is necessary for ZARR3 header detection and is consistent with other format constant imports already present in the file.


485-493: LGTM - Match cases correctly updated for ZARR3.

The addition of ZARR3 and ZARR3_MULTILAYER to the match cases properly routes these upload types to the appropriate exploration methods, consistent with the existing ZARR handling.


514-514: LGTM - Correct ZARR3 exploration handling.

The conditional exploration for ZARR3 correctly uses FILENAME_ZARR_JSON and mirrors the existing ZARR handling pattern at line 513.


524-525: LGTM - Correct multi-layer exploration routing.

The match cases correctly route ZARR3_MULTILAYER to the new getZarr3LayerDirectories method, consistent with the existing ZARR_MULTILAYER handling.


687-690: LGTM - Correct ZARR3 type detection.

The detection logic correctly identifies ZARR3 uploads at depth 2 (single-layer) and depth 3 (multi-layer), mirroring the existing ZARR detection pattern. The detection order is appropriate, checking single-layer formats before multi-layer variants.


716-717: LGTM - Well-implemented ZARR3 detection method.

The looksLikeZarr3Array method correctly detects ZARR3 arrays by checking for FILENAME_ZARR_JSON and follows the same pattern as the existing looksLikeZarrArray method.


765-769: LGTM - Critical bug fix applied correctly.

The use of .getOrElse(false) correctly addresses the previous critical review comment. This ensures that only directories where looksLikeZarrArray returns Full(true) are included in layerDirs, preventing false positives from Full(false) being treated as truthy.


771-775: LGTM - Correct ZARR3 layer directory detection.

The getZarr3LayerDirectories method is well-implemented, consistent with getZarrLayerDirectories, and correctly uses .getOrElse(false) to unwrap the Box[Boolean] result.


781-781: LGTM - MaxDepth increase addresses edge case.

Increasing maxDepth from 2 to 3 allows the method to find header files deeper in the directory tree, which addresses the edge case mentioned in the PR where zarr.json can appear at multiple levels. The depth-classification logic at lines 797-809 appropriately handles this by checking specific depth patterns rather than all depths.


797-809: LGTM - Clean refactoring of depth checking logic.

The refactoring to use pathExistsWithDepth and getPathDepth helper methods improves code clarity and maintainability without changing the underlying logic. The implementation is straightforward and correct.


919-920: LGTM - Enum correctly updated for ZARR3 support.

The UploadedDataSourceType enum correctly includes the new ZARR3 and ZARR3_MULTILAYER values, completing the ZARR3 upload support feature. Note that this is a public API change.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

797-806: Good refactoring! Parameter rename improves clarity.

Renaming headerWkwPaths to headerFilePaths better reflects that these functions now work with multiple format types (WKW, ZARR, ZARR3), not just WKW. This improves code readability and maintainability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 155e22a and 16797a7.

📒 Files selected for processing (2)
  • unreleased_changes/9015.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1)
  • FILENAME_ZARR_JSON (168-168)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
  • PathUtils (14-14)
  • listDirectories (92-93)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)

17-17: LGTM! Import correctly added for ZARR3 support.

The import of FILENAME_ZARR_JSON is necessary for detecting zarr3 format datasets and is correctly sourced from the Zarr3ArrayHeader object.


485-492: Excellent! Pattern matching now handles all ZARR3 variants.

The pattern matching correctly routes both ZARR3 and ZARR3_MULTILAYER to their respective exploration paths, addressing the previous review concern about missing ZARR3_MULTILAYER handling.


513-514: LGTM! ZARR3 exploration implemented consistently.

The ZARR3 exploration logic correctly mirrors the ZARR implementation, using the appropriate FILENAME_ZARR_JSON header file for zarr3 format detection and directory structure adjustment.


523-527: Great! Multi-layer ZARR3 handling complete.

The tryExploringMultipleLayers function now correctly handles ZARR3_MULTILAYER by calling the appropriate getZarr3LayerDirectories function, addressing the previous review concern.


687-690: LGTM! ZARR3 format detection properly integrated.

The format detection logic correctly identifies both single-layer and multi-layer zarr3 datasets using appropriate depth checks that mirror the ZARR detection pattern.


716-717: LGTM! ZARR3 detection function correctly implemented.

The looksLikeZarr3Array function appropriately searches for the zarr.json file that characterizes the zarr3 format, correctly mirroring the pattern used for zarr2 detection.


771-775: LGTM! Layer directory detection for ZARR3 correctly implemented.

The getZarr3LayerDirectories function appropriately identifies zarr3 layer directories by filtering with the looksLikeZarr3Array predicate, maintaining consistency with the zarr2 implementation pattern.


916-918: LGTM! Enum correctly extended for ZARR3 support.

The UploadedDataSourceType enum now includes both ZARR3 and ZARR3_MULTILAYER values, providing the foundation for complete zarr3 dataset handling throughout the upload service.

unreleased_changes/9015.md (1)

1-2: LGTM! Changelog entry accurately documents the fix.

The changelog entry clearly describes the fix for uploading zarr3 datasets without prior webknossos metadata, which aligns perfectly with the PR objectives and the code changes implemented.

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, thanks for applying the feedback

Sadly during testing I got

Image

[Server Time 2025-10-23T15:16:23.156Z] Invalid dataset uploaded: Not imported yet.

I got this while uploading a wkw dataset without any datasource-properties.json.

The same goes for zarr. Both datasets had only a single volume layer.
The server logs only state

backend: 2025-10-27 10:30:23,456 [warn] com.scalableminds.webknossos.datastore.controllers.DataSourceController - Answering 400 at /data/datasets/finishUpload?token=4yIOG7KWxNMbAyugf60s9A: {"messages":[{"error":"Failed to finalize upload for dataset 68ff3baf7c0100d3e5192056."},{"chain":"[Server Time 2025-10-27T09:30:23.455Z] Invalid dataset uploaded: Not imported yet."}]}

So no real additional information here :/

@fm3
Copy link
Member Author

fm3 commented Oct 27, 2025

Ah interesting, your zarr test dataset had another edge case that wasn’t accounted for. There is a heuristic to insert a layer directory into the path hierarchy if the uploaded content looks like it’s already directly a layer dir. There was a false positive here, because zarr.json can exist on all levels, not just in mag dirs. I adapted the detection and would hope that this works now.

WKW is not supported without datasource-properties.json, this is not added with this PR. But this is also not something that we’ve seen a lot.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

797-804: Use Path API for depth instead of counting '/'.

path.toString.count(_ == '/') is OS-dependent. Prefer path.getNameCount - 1 on relative paths.

- private def pathExistsWithDepth(pathDepth: Int, paths: Seq[Path]) =
-   paths.exists(getPathDepth(_) == pathDepth)
+ private def pathExistsWithDepth(pathDepth: Int, paths: Seq[Path]) =
+   paths.exists(getPathDepth(_) == pathDepth)

- private def getPathDepth(path: Path) =
-   path.toString.count(_ == '/')
+ private def getPathDepth(path: Path): Int = {
+   // listingRelative is a relative path; depth == number of parent segments
+   math.max(0, path.getNameCount - 1)
+ }

Also applies to: 805-810

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16797a7 and 3674943.

📒 Files selected for processing (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (2)
  • Zarr3ArrayHeader (24-104)
  • Zarr3ArrayHeader (166-325)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • runIf (169-178)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
  • PathUtils (14-14)
  • listDirectories (92-93)
  • listFilesRecursive (104-108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1)

168-168: Constant as val is fine.

Makes intent explicit; no behavioral risk.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)

485-488: Zarr3 single-layer integration looks good.

Correctly routes ZARR3 through layer/mag fix-up then exploration.

Also applies to: 514-517


491-494: Multilayer Zarr3 path added correctly.

Coverage for ZARR3_MULTILAYER in both dispatcher and helper is complete.

Also applies to: 524-526


688-691: Zarr3 detection logic is consistent with existing patterns.

Presence-based checks for zarr.json at depths 2/3 match Zarr v2 approach.

Also applies to: 716-718


781-795: Depth increase to 3: add tests to guard heuristics and perf.

Good for deeper Zarr3 layouts; please add tests with representative structures to ensure mag/layer relocation still behaves as expected and doesn’t regress Zarr v2.

Do you want me to scaffold sample directory trees and an fs-backed test?


918-921: The review comment concern about ordinal/id shifts is invalid—remove or clarify.

The UploadedDataSourceType enum is never persisted numerically. It is used solely for local control flow during upload processing (lines 483–494), determined fresh each time from file structure via guessTypeOfUploadedDataSource(), and never serialized to a database or API. Standard Scala Enumeration serializes by name by default, not by ordinal. Placing ZARR3 and ZARR3_MULTILAYER before existing values poses no breaking risk.

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and 2nd time testing worked. Thanks a lot 🎉 🟢

@fm3 fm3 merged commit 8361f14 into master Oct 27, 2025
5 checks passed
@fm3 fm3 deleted the upload-explore-zarr3 branch October 27, 2025 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

explore uploaded datasource also as zarr3

3 participants