-
Notifications
You must be signed in to change notification settings - Fork 30
Fix S3 deletion for buckets/endpoints other than what’s configured for upload #9047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRefactors credential handling and Managed S3 support across datastore services: adds per-path credential discovery, builds S3 clients per-credential/endpoint, wires ManagedS3Service into vault/upload/storage services, and fixes dataset deletion for buckets with global credentials. Adds changelog entry. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
unreleased_changes/9047.md (1)
2-2: Capitalize "S3" for consistency.The entry uses lowercase "s3" when referring to Amazon S3, which should be capitalized as "S3" per AWS naming conventions and standard casing for acronyms.
Apply this diff:
- Fixed dataset deletion for data stored on managed s3 buckets different from the one currently selected for dataset upload. + Fixed dataset deletion for data stored on managed S3 buckets different from the one currently selected for dataset upload.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
41-47: Consider logging credential parsing failures.The
flatMapon line 43 silently ignores credentials that fail to parse. While this provides resilience, it could make debugging misconfigured credentials difficult. Consider logging warnings for failed parses so operators can identify configuration issues.Apply this diff to add warning logs for parse failures:
private lazy val globalCredentials: Seq[DataVaultCredential] = { - val res = config.Datastore.DataVaults.credentials.flatMap { credentialConfig => - new CredentialConfigReader(credentialConfig).getCredential + val res = config.Datastore.DataVaults.credentials.flatMap { credentialConfig => + val credentialOpt = new CredentialConfigReader(credentialConfig).getCredential + if (credentialOpt.isEmpty) { + logger.warn(s"Failed to parse credential from config: $credentialConfig") + } + credentialOpt } logger.info(s"Parsed ${res.length} global data vault credentials from datastore config.") res }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
unreleased_changes/9047.md(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSUsedStorageService.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala(4 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala
📚 Learning: 2025-01-27T12:06:42.865Z
Learnt from: MichaelBuessemeyer
Repo: scalableminds/webknossos PR: 8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala
🧬 Code graph analysis (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)
create(183-190)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSUsedStorageService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
isLocal(29-29)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
pathIsInManagedS3(34-38)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
toFox(12-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (1)
toCredentialsProvider(44-46)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (4)
ManagedS3Service(32-175)findGlobalCredentialFor(38-41)findGlobalCredentialFor(55-60)pathIsInManagedS3(34-38)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
toFox(12-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
isLocal(29-29)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (8)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (9)
Fox(28-228)Fox(230-303)map(279-282)map(375-375)s(234-238)s(238-248)s(248-257)serialCombined(93-97)serialCombined(97-109)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig(11-75)DataVaults(58-60)S3Upload(61-65)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
PathSchemes(3-9)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (3)
S3UriUtils(7-52)hostBucketFromUri(9-22)objectKeyFromUri(40-49)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (6)
UPath(54-96)fromString(59-59)toString(106-109)toString(174-174)toRemoteUriUnsafe(121-121)toRemoteUriUnsafe(188-188)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader(8-64)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (5)
DataVaultCredential(19-21)S3AccessKeyCredential(36-47)S3AccessKeyCredential(49-51)name(80-80)toCredentialsProvider(44-46)util/src/main/scala/com/scalableminds/util/time/Instant.scala (2)
now(48-48)logSince(72-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultCredentials.scala (1)
44-46: LGTM! Clean encapsulation of credentials provider creation.The new
toCredentialsProvidermethod properly delegates AWS credentials provider construction to the credential object itself, improving cohesion and reusability across the codebase.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
86-97: LGTM! Clean credential provider integration.The
buildClientmethod properly uses the newcredential.toCredentialsProvidermethod, centralizing AWS client construction with per-credential configuration.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
540-540: LGTM! Consistent with ManagedS3Service refactor.The rename to
s3UploadTransferManagerBoxaligns with the updated naming in ManagedS3Service.webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)
192-199: LGTM! Improved encapsulation of credential provider creation.Delegating to
s3AccessKeyCredential.toCredentialsProviderremoves duplication and centralizes the AWS credentials provider construction logic in the credential class itself.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSUsedStorageService.scala (1)
34-36: LGTM! Consistent with centralized credential management.The integration with
ManagedS3Servicefor path validation aligns with the PR's goal of centralizing S3 credential discovery and enables proper handling of multiple managed S3 stores.Also applies to: 60-61
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)
28-31: LGTM! Successfully centralizes credential discovery.The refactor properly delegates all S3 credential resolution to
ManagedS3Service, removing duplicate credential discovery logic. Note that line 136's logic change (!managedS3Service.pathIsInManagedS3) now disallows paths in managed S3 from being added directly, which aligns with the PR's objective to support deletion across multiple managed S3 stores.Also applies to: 39-40, 113-122, 124-130, 132-137
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Show resolved
Hide resolved
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 mini comments for the code. Need to do testing now
| UPath.fromString(c.name).map(path.startsWith).getOrElse(false)) | ||
|
|
||
| def findGlobalCredentialFor(pathOpt: Option[UPath]): Option[S3AccessKeyCredential] = | ||
| pathOpt.flatMap(findGlobalCredentialFor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a suggestion: Maybe move this next to the other findGlobalCredentialFor implementation?
| firstPath <- paths.headOption.toFox // groupBy of the caller guarantees that this is not called with empty list. | ||
| credential <- findGlobalCredentialFor(firstPath).toFox // Convention is that the credentials are per bucket, so we can reuse this for all paths | ||
| endpoint = endpointForCredentialName(credential.name) | ||
| s3client <- buildClient(endpoint, credential).toFox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini typo: s3client -> s3Client. Or was this on purpose? At least the old code used s3Client
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing went well 👍
- In application.conf, insert multiple dataVaults.credentials for s3 endpoints. Set s3Upload.credentialName to the first of them
- Upload a small dataset
- set the s3Upload.credentialName to another of them
- make sure both datasets can be loaded
- delete both datasets from their respective settings views.
- backend logging should show they were deleted from their respective s3 buckets
regarding:
- delete both datasets from their respective settings views.
I didn't delete on the second bucket but made this manually as this wasn't a testing bucket. I felt safer this way. But deleting on the not aas upload configured bucket worked. So imo the whole PR works 👍
Thanks for fixing this :)
We can have multiple managed s3 stores. Only one is selected for upload but dataset deletion should work for all of them. This PR refactors the ManagedS3Service a bit and selects the correct client with endpoint/credential for each path.
URL of deployed dev instance (used for testing):
Steps to test:
(I already tested locally, no need to re-test)
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)