On dataset upload, attempt exploring also as zarr3 #9015

fm3 · 2025-10-23T09:36:59Z

Steps to test:

Upload zarr3 data without datasource-properties.json
Should be explored and become viewable

Issues:

fixes explore uploaded datasource also as zarr3 #9014

Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
Considered common edge cases
Needs datastore update after deployment

MichaelBuessemeyer

Thanks for enabling this. I think I tried this already locally once or twice and each time it failed 🙈.

During the code review I found that you introduced ZARR3_MULTILAYER but this is missing in 2 locations I think. Please have a look

...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala

coderabbitai · 2025-10-23T15:01:25Z

📝 Walkthrough

Walkthrough

Adds a changelog entry and implements ZARR3 upload detection/exploration: new ZARR3/ZARR3_MULTILAYER types, ZARR3-aware exploration/path resolution and helpers in UploadService, and changes FILENAME_ZARR_JSON from def to val in Zarr3ArrayHeader.

Changes

Cohort / File(s)	Summary
Changelog Entry `unreleased_changes/9015.md`	Added changelog entry documenting fix for uploading ZARR3 datasets without prior webknossos metadata.
Upload service (detection & exploration) `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`	Added `ZARR3` and `ZARR3_MULTILAYER` to `UploadedDataSourceType`; extended detection and dispatch logic to recognize ZARR3 (including header FILENAME_ZARR_JSON), added ZARR3 layer-directory discovery and looksLikeZarr3Array checks, increased scan depth (2→3) and introduced depth/path helper utilities.
Zarr3 header constant `webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala`	Changed `FILENAME_ZARR_JSON` from a parameterless method (`def`) to a value (`val`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Potential review focal points:

UploadService: branching, dispatch additions for ZARR3/ZARR3_MULTILAYER and integration points where datasource-properties are generated.
Directory scanning and path-depth helpers (ensure depth increase and traversal logic are safe and correct).
Change from def to val in Zarr3ArrayHeader (initialization timing / binary-compat considerations).

Possibly related PRs

Fix edge cases in NGFF dataset upload #8341 — Modifies upload/exploration logic in UploadService.scala; likely touches same detection/path-resolution code.
Fix format in volume annotation download zarr.json #8800 — Changes Zarr3ArrayHeader and related zarr3 codec/metadata handling; related to the FILENAME_ZARR_JSON change.
Fix already upload of zarr datasets with already existing datasource-properties.json file #8268 — Adjusts upload detection logic for Zarr datasets and handling of datasource-properties; overlaps in detection behavior.

Suggested labels

zarr

Suggested reviewers

MichaelBuessemeyer
normanrz

Poem

🐰 I hopped through folders, sniffed each file,
Found a tiny zarr and paused a while.
I nudged a header, wrote the trace,
Hopped back home with metadata in place. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "On dataset upload, attempt exploring also as zarr3" directly and clearly summarizes the main change introduced in this PR. The title accurately reflects the primary objective of adding ZARR3 format support to the dataset upload exploration flow in UploadService.scala, which aligns with the code changes that extend the exploration logic to recognize and process ZARR3 datasets. The title is concise, specific, and uses clear language that would enable a teammate to quickly understand the changeset when scanning the git history.
Linked Issues Check	✅ Passed	The code changes successfully address the primary objectives from issue #9014. The UploadService.scala has been extended with ZARR3 and ZARR3_MULTILAYER support in the UploadedDataSourceType enum, detection logic now recognizes ZARR3 format, and exploration pathways have been updated to handle ZARR3 sources with new helper methods like getZarr3LayerDirectories and looksLikeZarr3Array. The supporting change to Zarr3ArrayHeader.scala (converting FILENAME_ZARR_JSON from a method to a value) enables proper ZARR3 header recognition. These changes collectively ensure that uploaded zarr3 datasets without datasource-properties.json are now detected, explored, and become viewable by the backend, resolving the original issue where the frontend allowed zarr3 uploads that the backend couldn't process.
Out of Scope Changes Check	✅ Passed	All code changes in this PR are directly scoped to implementing ZARR3 support for the upload exploration flow. The modifications to UploadService.scala add ZARR3-specific detection and exploration logic, changes to Zarr3ArrayHeader.scala provide necessary infrastructure for ZARR3 header recognition, and the changelog entry documents the fix. The increase of maxDepth from 2 to 3 in addLayerAndMagDirIfMissing, while touching tangential logic, is a necessary adjustment to support deeper directory structure scanning for ZARR3 layer detection. The PR author explicitly notes that WKW format remains unsupported and is not added, demonstrating awareness of scope boundaries.
Description Check	✅ Passed	The pull request description is directly related to the changeset and provides relevant context about the changes. It explains the testing steps (uploading zarr3 data without datasource-properties.json to verify it becomes viewable), references the fixed issue (#9014), and documents that a changelog entry was added and common edge cases were considered. This description adequately conveys the intent and scope of the code changes implemented in the PR.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch upload-explore-zarr3

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3674943 and c935105.

📒 Files selected for processing (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (2)

Zarr3ArrayHeader (24-104)

Zarr3ArrayHeader (166-325)

util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)

runIf (169-178)

toFox (14-14)

util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)

PathUtils (14-14)

listDirectories (92-93)

listFilesRecursive (104-108)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: backend-tests
GitHub Check: build-smoketest-push
GitHub Check: frontend-tests

🔇 Additional comments (11)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (11)

17-17: LGTM - Correct import for ZARR3 support.

The import of FILENAME_ZARR_JSON is necessary for ZARR3 header detection and is consistent with other format constant imports already present in the file.

485-493: LGTM - Match cases correctly updated for ZARR3.

The addition of ZARR3 and ZARR3_MULTILAYER to the match cases properly routes these upload types to the appropriate exploration methods, consistent with the existing ZARR handling.

514-514: LGTM - Correct ZARR3 exploration handling.

The conditional exploration for ZARR3 correctly uses FILENAME_ZARR_JSON and mirrors the existing ZARR handling pattern at line 513.

524-525: LGTM - Correct multi-layer exploration routing.

The match cases correctly route ZARR3_MULTILAYER to the new getZarr3LayerDirectories method, consistent with the existing ZARR_MULTILAYER handling.

687-690: LGTM - Correct ZARR3 type detection.

The detection logic correctly identifies ZARR3 uploads at depth 2 (single-layer) and depth 3 (multi-layer), mirroring the existing ZARR detection pattern. The detection order is appropriate, checking single-layer formats before multi-layer variants.

716-717: LGTM - Well-implemented ZARR3 detection method.

The looksLikeZarr3Array method correctly detects ZARR3 arrays by checking for FILENAME_ZARR_JSON and follows the same pattern as the existing looksLikeZarrArray method.

765-769: LGTM - Critical bug fix applied correctly.

The use of .getOrElse(false) correctly addresses the previous critical review comment. This ensures that only directories where looksLikeZarrArray returns Full(true) are included in layerDirs, preventing false positives from Full(false) being treated as truthy.

771-775: LGTM - Correct ZARR3 layer directory detection.

The getZarr3LayerDirectories method is well-implemented, consistent with getZarrLayerDirectories, and correctly uses .getOrElse(false) to unwrap the Box[Boolean] result.

781-781: LGTM - MaxDepth increase addresses edge case.

Increasing maxDepth from 2 to 3 allows the method to find header files deeper in the directory tree, which addresses the edge case mentioned in the PR where zarr.json can appear at multiple levels. The depth-classification logic at lines 797-809 appropriately handles this by checking specific depth patterns rather than all depths.

797-809: LGTM - Clean refactoring of depth checking logic.

The refactoring to use pathExistsWithDepth and getPathDepth helper methods improves code clarity and maintainability without changing the underlying logic. The implementation is straightforward and correct.

919-920: LGTM - Enum correctly updated for ZARR3 support.

The UploadedDataSourceType enum correctly includes the new ZARR3 and ZARR3_MULTILAYER values, completing the ZARR3 upload support feature. Note that this is a public API change.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

797-806: Good refactoring! Parameter rename improves clarity.

Renaming headerWkwPaths to headerFilePaths better reflects that these functions now work with multiple format types (WKW, ZARR, ZARR3), not just WKW. This improves code readability and maintainability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 155e22a and 16797a7.

📒 Files selected for processing (2)

unreleased_changes/9015.md (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1)

FILENAME_ZARR_JSON (168-168)

util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)

PathUtils (14-14)

listDirectories (92-93)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: frontend-tests
GitHub Check: build-smoketest-push
GitHub Check: backend-tests

🔇 Additional comments (9)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)

17-17: LGTM! Import correctly added for ZARR3 support.

The import of FILENAME_ZARR_JSON is necessary for detecting zarr3 format datasets and is correctly sourced from the Zarr3ArrayHeader object.

485-492: Excellent! Pattern matching now handles all ZARR3 variants.

The pattern matching correctly routes both ZARR3 and ZARR3_MULTILAYER to their respective exploration paths, addressing the previous review concern about missing ZARR3_MULTILAYER handling.

513-514: LGTM! ZARR3 exploration implemented consistently.

The ZARR3 exploration logic correctly mirrors the ZARR implementation, using the appropriate FILENAME_ZARR_JSON header file for zarr3 format detection and directory structure adjustment.

523-527: Great! Multi-layer ZARR3 handling complete.

The tryExploringMultipleLayers function now correctly handles ZARR3_MULTILAYER by calling the appropriate getZarr3LayerDirectories function, addressing the previous review concern.

687-690: LGTM! ZARR3 format detection properly integrated.

The format detection logic correctly identifies both single-layer and multi-layer zarr3 datasets using appropriate depth checks that mirror the ZARR detection pattern.

716-717: LGTM! ZARR3 detection function correctly implemented.

The looksLikeZarr3Array function appropriately searches for the zarr.json file that characterizes the zarr3 format, correctly mirroring the pattern used for zarr2 detection.

771-775: LGTM! Layer directory detection for ZARR3 correctly implemented.

The getZarr3LayerDirectories function appropriately identifies zarr3 layer directories by filtering with the looksLikeZarr3Array predicate, maintaining consistency with the zarr2 implementation pattern.

916-918: LGTM! Enum correctly extended for ZARR3 support.

The UploadedDataSourceType enum now includes both ZARR3 and ZARR3_MULTILAYER values, providing the foundation for complete zarr3 dataset handling throughout the upload service.

unreleased_changes/9015.md (1)

1-2: LGTM! Changelog entry accurately documents the fix.

The changelog entry clearly describes the fix for uploading zarr3 datasets without prior webknossos metadata, which aligns perfectly with the PR objectives and the code changes implemented.

MichaelBuessemeyer

Neat, thanks for applying the feedback

Sadly during testing I got

[Server Time 2025-10-23T15:16:23.156Z] Invalid dataset uploaded: Not imported yet.

I got this while uploading a wkw dataset without any datasource-properties.json.

The same goes for zarr. Both datasets had only a single volume layer.
The server logs only state

backend: 2025-10-27 10:30:23,456 [warn] com.scalableminds.webknossos.datastore.controllers.DataSourceController - Answering 400 at /data/datasets/finishUpload?token=4yIOG7KWxNMbAyugf60s9A: {"messages":[{"error":"Failed to finalize upload for dataset 68ff3baf7c0100d3e5192056."},{"chain":"[Server Time 2025-10-27T09:30:23.455Z] Invalid dataset uploaded: Not imported yet."}]}

So no real additional information here :/

fm3 · 2025-10-27T10:44:30Z

Ah interesting, your zarr test dataset had another edge case that wasn’t accounted for. There is a heuristic to insert a layer directory into the path hierarchy if the uploaded content looks like it’s already directly a layer dir. There was a false positive here, because zarr.json can exist on all levels, not just in mag dirs. I adapted the detection and would hope that this works now.

WKW is not supported without datasource-properties.json, this is not added with this PR. But this is also not something that we’ve seen a lot.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
797-804: Use Path API for depth instead of counting '/'.

path.toString.count(_ == '/') is OS-dependent. Prefer path.getNameCount - 1 on relative paths.
- private def pathExistsWithDepth(pathDepth: Int, paths: Seq[Path]) =
-   paths.exists(getPathDepth(_) == pathDepth)
+ private def pathExistsWithDepth(pathDepth: Int, paths: Seq[Path]) =
+   paths.exists(getPathDepth(_) == pathDepth)

- private def getPathDepth(path: Path) =
-   path.toString.count(_ == '/')
+ private def getPathDepth(path: Path): Int = {
+   // listingRelative is a relative path; depth == number of parent segments
+   math.max(0, path.getNameCount - 1)
+ }
Also applies to: 805-810

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16797a7 and 3674943.

📒 Files selected for processing (2)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1 hunks)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (2)

Zarr3ArrayHeader (24-104)

Zarr3ArrayHeader (166-325)

util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)

runIf (169-178)

util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)

PathUtils (14-14)

listDirectories (92-93)

listFilesRecursive (104-108)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-smoketest-push
GitHub Check: frontend-tests
GitHub Check: backend-tests

🔇 Additional comments (6)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datareaders/zarr3/Zarr3ArrayHeader.scala (1)

168-168: Constant as val is fine.

Makes intent explicit; no behavioral risk.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)

485-488: Zarr3 single-layer integration looks good.

Correctly routes ZARR3 through layer/mag fix-up then exploration.

Also applies to: 514-517

491-494: Multilayer Zarr3 path added correctly.

Coverage for ZARR3_MULTILAYER in both dispatcher and helper is complete.

Also applies to: 524-526

688-691: Zarr3 detection logic is consistent with existing patterns.

Presence-based checks for zarr.json at depths 2/3 match Zarr v2 approach.

Also applies to: 716-718

781-795: Depth increase to 3: add tests to guard heuristics and perf.

Good for deeper Zarr3 layouts; please add tests with representative structures to ensure mag/layer relocation still behaves as expected and doesn’t regress Zarr v2.

Do you want me to scaffold sample directory trees and an fs-backed test?

918-921: The review comment concern about ordinal/id shifts is invalid—remove or clarify.

The UploadedDataSourceType enum is never persisted numerically. It is used solely for local control flow during upload processing (lines 483–494), determined fresh each time from file structure via guessTypeOfUploadedDataSource(), and never serialized to a database or API. Standard Scala Enumeration serializes by name by default, not by ordinal. Placing ZARR3 and ZARR3_MULTILAYER before existing values poses no breaking risk.

...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala

MichaelBuessemeyer

Looks good and 2nd time testing worked. Thanks a lot 🎉 🟢

On dataset upload, attempt exploring also as zarr3

1d8c653

fm3 self-assigned this Oct 23, 2025

fm3 added bug enhancement backend labels Oct 23, 2025

changelog

0abe8e6

fm3 requested a review from MichaelBuessemeyer October 23, 2025 09:37

unused import

94acc14

MichaelBuessemeyer reviewed Oct 23, 2025

View reviewed changes

...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala Show resolved Hide resolved

pr feedback: add missing multilayer cases

16797a7

fm3 requested a review from MichaelBuessemeyer October 23, 2025 15:01

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

MichaelBuessemeyer reviewed Oct 27, 2025

View reviewed changes

fm3 added 2 commits October 27, 2025 11:27

Merge branch 'master' into upload-explore-zarr3

4a65b91

improve heuristic for looksLikeLayerDir and looksLikeMagDir

3674943

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala Show resolved Hide resolved

fix box value check

c935105

MichaelBuessemeyer reviewed Oct 27, 2025

View reviewed changes

MichaelBuessemeyer approved these changes Oct 27, 2025

View reviewed changes

fm3 merged commit 8361f14 into master Oct 27, 2025
5 checks passed

fm3 deleted the upload-explore-zarr3 branch October 27, 2025 12:55

On dataset upload, attempt exploring also as zarr3 #9015

On dataset upload, attempt exploring also as zarr3 #9015

Uh oh!

Conversation

fm3 commented Oct 23, 2025

Steps to test:

Issues:

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

fm3 commented Oct 27, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichaelBuessemeyer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Oct 23, 2025 •

edited

Loading

MichaelBuessemeyer left a comment •

edited

Loading