Skip to content

Conversation

@fm3
Copy link
Member

@fm3 fm3 commented Nov 4, 2025

We can have multiple managed s3 stores. Only one is selected for upload but dataset deletion should work for all of them. This PR refactors the ManagedS3Service a bit and selects the correct client with endpoint/credential for each path.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

(I already tested locally, no need to re-test)

  • In application.conf, insert multiple dataVaults.credentials for s3 endpoints. Set s3Upload.credentialName to the first of them
  • Upload a small dataset
  • set the s3Upload.credentialName to another of them
  • make sure both datasets can be loaded
  • delete both datasets from their respective settings views.
  • backend logging should show they were deleted from their respective s3 buckets

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@fm3 fm3 self-assigned this Nov 4, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 4, 2025

📝 Walkthrough

Walkthrough

Refactors credential handling and Managed S3 support across datastore services: adds per-path credential discovery, builds S3 clients per-credential/endpoint, wires ManagedS3Service into vault/upload/storage services, and fixes dataset deletion for buckets with global credentials. Adds changelog entry.

Changes

Cohort / File(s) Summary
Changelog
unreleased_changes/9047.md
Added changelog entry documenting a bug fix: dataset deletion now works for data on managed S3 buckets different from the configured upload bucket.
Credential helper
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala
Added toCredentialsProvider on S3AccessKeyCredential to produce a StaticCredentialsProvider from stored keys.
S3 data vault
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
Replaced direct static credential construction with credential object's provider (toCredentialsProvider) where S3 clients are created.
Managed S3 service
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Introduced per-path credential lookup (pathIsInManagedS3, findGlobalCredentialFor), global credentials parsing, per-credential client/transfer-manager builders, endpoint derivation, and revised delete flow to group by bucket and use per-call clients with timing logs. Constructor param renamed to config.
Data vault service
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala
Injected ManagedS3Service; delegated credential resolution and path checks to ManagedS3Service; removed internal global-credential caching and helpers.
Used storage service
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSUsedStorageService.scala
Injected ManagedS3Service; replaced vault-credential-based remote-path checks with managedS3Service.pathIsInManagedS3(...).
Upload service
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Switched directory upload to use managedS3Service.s3UploadTransferManagerBox (per-credential transfer manager) for uploads.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • ManagedS3Service: credential parsing, endpoint derivation, per-call client lifecycle, and concurrency/resource cleanup.
    • Delete flow: correctness of bucket grouping, credential selection per-bucket, and error handling for mixed-credential buckets.
    • Dependency wiring: constructor changes across services (injection compatibility and tests).
    • S3 credential provider creation: ensure toCredentialsProvider usage matches AWS SDK expectations.

Possibly related PRs

Suggested reviewers

  • frcroth
  • MichaelBuessemeyer

Poem

🐇 I hopped through buckets, keys in tow,
Found credentials where wild endpoints grow,
Per-path I sniffed, each secret and seam,
Deleted the ghosts and polished the stream,
A rabbit's small hop — datasets gleam! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main fix: enabling S3 deletion across multiple buckets/endpoints rather than just the one configured for upload.
Description check ✅ Passed The description clearly explains the problem (multiple managed S3 stores exist but only one is configured for upload), the solution (refactor to select correct client per path), and provides concrete test steps.
Linked Issues check ✅ Passed The PR successfully addresses issue #9043 by refactoring ManagedS3Service to enable per-path credential/endpoint discovery (pathIsInManagedS3, findGlobalCredentialFor) and updating deletion flow to use correct credentials for each bucket.
Out of Scope Changes check ✅ Passed All changes align with the stated objective: credential provider refactoring in S3DataVault, DSUsedStorageService integration with managed S3, ManagedS3Service enhancements for per-path credential lookup, and UploadService client reference updates are all necessary to support multi-bucket S3 deletion.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch s3-delete-client

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53496ed and b6ba6c6.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-75)
  • DataVaults (58-60)
  • S3Upload (61-65)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (3)
  • S3UriUtils (7-52)
  • hostBucketFromUri (9-22)
  • objectKeyFromUri (40-49)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (10)
  • UPath (54-96)
  • getScheme (119-119)
  • getScheme (186-186)
  • fromString (59-59)
  • startsWith (138-142)
  • startsWith (198-205)
  • toString (106-109)
  • toString (174-174)
  • toRemoteUriUnsafe (121-121)
  • toRemoteUriUnsafe (188-188)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (5)
  • DataVaultCredential (19-21)
  • S3AccessKeyCredential (36-47)
  • S3AccessKeyCredential (49-51)
  • name (80-80)
  • toCredentialsProvider (44-46)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (2)
  • now (48-48)
  • logSince (72-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)

124-128: Good addition: Timing instrumentation for S3 deletions.

The timing instrumentation added here provides valuable observability for S3 deletion operations. Using Instant.now and Instant.logSince to measure and log duration is a clean approach that will help monitor performance in production.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
unreleased_changes/9047.md (1)

2-2: Capitalize "S3" for consistency.

The entry uses lowercase "s3" when referring to Amazon S3, which should be capitalized as "S3" per AWS naming conventions and standard casing for acronyms.

Apply this diff:

- Fixed dataset deletion for data stored on managed s3 buckets different from the one currently selected for dataset upload.
+ Fixed dataset deletion for data stored on managed S3 buckets different from the one currently selected for dataset upload.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)

41-47: Consider logging credential parsing failures.

The flatMap on line 43 silently ignores credentials that fail to parse. While this provides resilience, it could make debugging misconfigured credentials difficult. Consider logging warnings for failed parses so operators can identify configuration issues.

Apply this diff to add warning logs for parse failures:

   private lazy val globalCredentials: Seq[DataVaultCredential] = {
-    val res = config.Datastore.DataVaults.credentials.flatMap { credentialConfig =>
-      new CredentialConfigReader(credentialConfig).getCredential
+    val res = config.Datastore.DataVaults.credentials.flatMap { credentialConfig =>
+      val credentialOpt = new CredentialConfigReader(credentialConfig).getCredential
+      if (credentialOpt.isEmpty) {
+        logger.warn(s"Failed to parse credential from config: $credentialConfig")
+      }
+      credentialOpt
     }
     logger.info(s"Parsed ${res.length} global data vault credentials from datastore config.")
     res
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4deb0db and 53496ed.

📒 Files selected for processing (7)
  • unreleased_changes/9047.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSUsedStorageService.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (4 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala
📚 Learning: 2025-01-27T12:06:42.865Z
Learnt from: MichaelBuessemeyer
Repo: scalableminds/webknossos PR: 8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala
🧬 Code graph analysis (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)
  • create (183-190)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSUsedStorageService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • isLocal (29-29)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
  • pathIsInManagedS3 (34-38)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • toFox (12-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (1)
  • toCredentialsProvider (44-46)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (4)
  • ManagedS3Service (32-175)
  • findGlobalCredentialFor (38-41)
  • findGlobalCredentialFor (55-60)
  • pathIsInManagedS3 (34-38)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • toFox (12-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • isLocal (29-29)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (8)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (9)
  • Fox (28-228)
  • Fox (230-303)
  • map (279-282)
  • map (375-375)
  • s (234-238)
  • s (238-248)
  • s (248-257)
  • serialCombined (93-97)
  • serialCombined (97-109)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-75)
  • DataVaults (58-60)
  • S3Upload (61-65)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
  • PathSchemes (3-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (3)
  • S3UriUtils (7-52)
  • hostBucketFromUri (9-22)
  • objectKeyFromUri (40-49)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (6)
  • UPath (54-96)
  • fromString (59-59)
  • toString (106-109)
  • toString (174-174)
  • toRemoteUriUnsafe (121-121)
  • toRemoteUriUnsafe (188-188)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (5)
  • DataVaultCredential (19-21)
  • S3AccessKeyCredential (36-47)
  • S3AccessKeyCredential (49-51)
  • name (80-80)
  • toCredentialsProvider (44-46)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (2)
  • now (48-48)
  • logSince (72-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (1)

44-46: LGTM! Clean encapsulation of credentials provider creation.

The new toCredentialsProvider method properly delegates AWS credentials provider construction to the credential object itself, improving cohesion and reusability across the codebase.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)

86-97: LGTM! Clean credential provider integration.

The buildClient method properly uses the new credential.toCredentialsProvider method, centralizing AWS client construction with per-credential configuration.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

540-540: LGTM! Consistent with ManagedS3Service refactor.

The rename to s3UploadTransferManagerBox aligns with the updated naming in ManagedS3Service.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)

192-199: LGTM! Improved encapsulation of credential provider creation.

Delegating to s3AccessKeyCredential.toCredentialsProvider removes duplication and centralizes the AWS credentials provider construction logic in the credential class itself.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSUsedStorageService.scala (1)

34-36: LGTM! Consistent with centralized credential management.

The integration with ManagedS3Service for path validation aligns with the PR's goal of centralizing S3 credential discovery and enables proper handling of multiple managed S3 stores.

Also applies to: 60-61

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)

28-31: LGTM! Successfully centralizes credential discovery.

The refactor properly delegates all S3 credential resolution to ManagedS3Service, removing duplicate credential discovery logic. Note that line 136's logic change (!managedS3Service.pathIsInManagedS3) now disallows paths in managed S3 from being added directly, which aligns with the PR's objective to support deletion across multiple managed S3 stores.

Also applies to: 39-40, 113-122, 124-130, 132-137

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 mini comments for the code. Need to do testing now

UPath.fromString(c.name).map(path.startsWith).getOrElse(false))

def findGlobalCredentialFor(pathOpt: Option[UPath]): Option[S3AccessKeyCredential] =
pathOpt.flatMap(findGlobalCredentialFor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion: Maybe move this next to the other findGlobalCredentialFor implementation?

firstPath <- paths.headOption.toFox // groupBy of the caller guarantees that this is not called with empty list.
credential <- findGlobalCredentialFor(firstPath).toFox // Convention is that the credentials are per bucket, so we can reuse this for all paths
endpoint = endpointForCredentialName(credential.name)
s3client <- buildClient(endpoint, credential).toFox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini typo: s3client -> s3Client. Or was this on purpose? At least the old code used s3Client

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing went well 👍

  • In application.conf, insert multiple dataVaults.credentials for s3 endpoints. Set s3Upload.credentialName to the first of them
  • Upload a small dataset
  • set the s3Upload.credentialName to another of them
  • make sure both datasets can be loaded
  • delete both datasets from their respective settings views.
  • backend logging should show they were deleted from their respective s3 buckets

regarding:

  • delete both datasets from their respective settings views.

I didn't delete on the second bucket but made this manually as this wasn't a testing bucket. I felt safer this way. But deleting on the not aas upload configured bucket worked. So imo the whole PR works 👍

Thanks for fixing this :)

@fm3 fm3 enabled auto-merge (squash) November 5, 2025 12:56
@fm3 fm3 merged commit 40db62a into master Nov 5, 2025
5 checks passed
@fm3 fm3 deleted the s3-delete-client branch November 5, 2025 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

s3 dataset delete should work on all buckets with global credentials

3 participants