-
Notifications
You must be signed in to change notification settings - Fork 30
Delete datasets on S3 #8924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete datasets on S3 #8924
Conversation
📝 WalkthroughWalkthroughCentralizes dataset deletion into a new DatasetService.deleteDataset flow that deletes data from disk and managed S3, adds ManagedS3Service and S3 utilities, refactors path-centric DAOs/clients/routes (including DELETE-with-JSON), updates controllers/routes/upload and datastore services, and tightens UPath behavior and tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
48-48: Log the actual deduplicated path.Line 48 logs
targetPath, but whenretryCount > 0, the actual destination isdeduplicatedTargetPath. Log the path that was actually used.Apply this diff:
- logger.info(s"Successfully moved dataset from $sourcePath to $targetPath.") + logger.info(s"Successfully moved dataset from $sourcePath to $deduplicatedTargetPath.")app/models/dataset/Dataset.scala (1)
871-890: Critical: Query returns wrong paths for non-virtual datasetsThe query selects only
m1.path, but for non-virtual datasets the physical path to delete is stored inm1.realPath. This means:
- Non-virtual datasets would return logical paths instead of physical filesystem paths
- Deletion would target the wrong locations
- Could cause data loss or orphaned data
A past reviewer flagged this exact issue and suggested using
COALESCE(m1.realPath, m1.path).Based on learnings
Apply this diff to select the actual physical path:
- pathsStrOpts <- run(q""" - SELECT m1.path FROM webknossos.dataset_mags m1 - WHERE m1._dataset = $datasetId - AND m1.path IS NOT NULL + pathsStrOpts <- run(q""" + SELECT COALESCE(m1.realPath, m1.path) AS path + FROM webknossos.dataset_mags m1 + WHERE m1._dataset = $datasetId + AND COALESCE(m1.realPath, m1.path) IS NOT NULL AND NOT EXISTS ( SELECT m2.path FROM webknossos.dataset_mags m2 WHERE m2._dataset != $datasetId AND ( - m2.path = m1.path + m2.path = COALESCE(m1.realPath, m1.path) OR - ( - m2.realpath IS NOT NULL AND m2.realpath = m1.realpath - ) + m2.realpath = COALESCE(m1.realPath, m1.path) ) ) """.as[Option[String]])app/models/dataset/DatasetService.scala (3)
548-557: Good fix: sequence deletion and refresh; errors now propagate.Replacing
_ = ...with<-for both deleteDataset and refreshStorageReport correctly sequences the Fox chain.
511-523: Critical: “used-only-by-this-dataset” can miss shared data (NULL realpath/symlink).Current DAO logic for mags uses realpath equality that fails with NULL and can delete paths still used by other datasets. Use COALESCE to align with index semantics, or conservatively block when NULLs exist. Also consider validating paths before delete.
Suggested DAO fix (app/models/dataset/Dataset.scala):
- AND NOT EXISTS ( - SELECT m2.path - FROM webknossos.dataset_mags m2 - WHERE m2._dataset != $datasetId - AND ( - m2.path = m1.path - OR ( - m2.realpath IS NOT NULL AND m2.realpath = m1.realpath - ) - ) - ) + AND NOT EXISTS ( + SELECT 1 + FROM webknossos.dataset_mags m2 + WHERE m2._dataset != $datasetId + AND COALESCE(m2.realpath, m2.path) = COALESCE(m1.realpath, m1.path) + )Optional safety here:
- pathsUsedOnlyByThisDataset = magPathsUsedOnlyByThisDataset ++ attachmentPathsUsedOnlyByThisDataset + pathsUsedOnlyByThisDataset = magPathsUsedOnlyByThisDataset ++ attachmentPathsUsedOnlyByThisDataset + // extra safety: validate only-local/managed-S3 and existence + _ <- validatePaths(pathsUsedOnlyByThisDataset, datastore)Run to confirm DAOs use COALESCE:
#!/bin/bash rg -n -C2 'findMagPathsUsedOnlyByThisDataset|COALESCE\\(m2\\.realpath, m2\\.path\\)' app/models/dataset/Dataset.scala
525-533: Critical: Deletion guard misses datasets with NULL realpath; use COALESCE in DAO.findDatasetsUsingDataFromDir relies on DAO query filtering m.realpath IS NOT NULL and starts_with(m.realpath,...). Legacy NULL realpaths won’t be seen → risk of deleting shared data.
Suggested DAO fix (app/models/dataset/Dataset.scala):
- WHERE m.realpath IS NOT NULL - AND starts_with(m.realpath, $absolutePathWithTrailingSlash) + WHERE starts_with(COALESCE(m.realpath, m.path), $absolutePathWithTrailingSlash)Optional: add a conservative pre-check to abort if any mags under directory have realpath IS NULL until a backfill/migration makes realpath non-NULL.
Also review attachments symmetry: attachments guard uses a.path only; if attachments ever gain realpath, mirror COALESCE there too.
#!/bin/bash rg -n -C3 'findDatasetsWithMagsInDir|starts_with\\(' app/models/dataset/Dataset.scala rg -n 'COALESCE\\(m\\.realpath, m\\.path\\)' app/models/dataset/Dataset.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
9-22: Harden S3 URI parsing: null-safety, regional endpoints, and key normalization.
- Avoid NPEs on
uri.getHost.- Support regional virtual-hosted endpoints (e.g.,
bucket.s3.us-west-2.amazonaws.com).- Normalize keys to avoid leading “/” for AWS S3.
- Make short-style detection scheme-aware (
s3://only).- Improve non-AWS host detection.
Apply this diff to the existing functions:
@@ - def hostBucketFromUri(uri: URI): Option[String] = { - val host = uri.getHost - if (host == null) { - None - } else if (isShortStyle(uri)) { // assume host is omitted from uri, shortcut form s3://bucket/key - Some(host) - } else if (isVirtualHostedStyle(uri)) { - Some(host.substring(0, host.length - ".s3.amazonaws.com".length)) - } else if (isPathStyle(uri)) { - Some(uri.getPath.substring(1).split("/")(0)) - } else { - None - } - } + def hostBucketFromUri(uri: URI): Option[String] = { + val hostO = hostOpt(uri) + if (isShortStyle(uri)) { + hostO + } else if (isVirtualHostedStyle(uri)) { + hostO.flatMap(bucketFromVirtualHost) + } else if (isPathStyle(uri)) { + val path = Option(uri.getPath).getOrElse("") + path.stripPrefix("/").split("/", 2).headOption.filter(_.nonEmpty) + } else None + } @@ - // https://bucket-name.s3.region-code.amazonaws.com/key-name - private def isVirtualHostedStyle(uri: URI): Boolean = - uri.getHost.endsWith(".s3.amazonaws.com") + // https://bucket-name.s3.amazonaws.com or https://bucket-name.s3.us-west-2.amazonaws.com + private def isVirtualHostedStyle(uri: URI): Boolean = + hostOpt(uri).exists(h => VirtualHostedPattern.pattern.matcher(h).matches) @@ - // https://s3.region-code.amazonaws.com/bucket-name/key-name - private def isPathStyle(uri: URI): Boolean = - uri.getHost.matches("s3(.[\\w\\-_]+)?.amazonaws.com") || - (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains(".")) + // https://s3.amazonaws.com/bucket/key, https://s3.us-west-2.amazonaws.com/bucket/key + // or non-AWS S3-compatible hosts (MinIO, Ceph, etc.) + private def isPathStyle(uri: URI): Boolean = + hostOpt(uri).exists { h => + h.matches("^s3([.-][\\w\\-_]+)?\\.amazonaws\\.com$") || + (!h.contains("amazonaws.com") && h.contains(".")) + } @@ - // S3://bucket-name/key-name - private def isShortStyle(uri: URI): Boolean = - !uri.getHost.contains(".") + // s3://bucket-name/key-name (short style) + private def isShortStyle(uri: URI): Boolean = + "s3".equalsIgnoreCase(uri.getScheme) && hostOpt(uri).exists(h => !h.contains(".")) @@ - def objectKeyFromUri(uri: URI): Box[String] = - if (isVirtualHostedStyle(uri)) { - Full(uri.getPath) - } else if (isPathStyle(uri)) { - Full(uri.getPath.substring(1).split("/").tail.mkString("/")) - } else if (isShortStyle(uri)) { - Full(uri.getPath.tail) - } else Failure(s"Not a valid s3 uri: $uri") + def objectKeyFromUri(uri: URI): Box[String] = { + val path = Option(uri.getPath).getOrElse("") + if (isVirtualHostedStyle(uri) || isShortStyle(uri)) { + Full(path.stripPrefix("/")) + } else if (isPathStyle(uri)) { + val parts = path.stripPrefix("/").split("/", 2) + if (parts.length == 2) Full(parts(1)) else Failure(s"No object key in $uri") + } else Failure(s"Not a valid s3 uri: $uri") + } @@ - def isNonAmazonHost(uri: URI): Boolean = - (isPathStyle(uri) && !uri.getHost.endsWith(".amazonaws.com")) || uri.getHost == "localhost" + def isNonAmazonHost(uri: URI): Boolean = + hostOpt(uri).exists(h => (isPathStyle(uri) && !h.endsWith(".amazonaws.com")) || h == "localhost")Add these helpers (append within the object):
// e.g. bucket.s3.amazonaws.com or bucket.s3.us-west-2.amazonaws.com private val VirtualHostedPattern = """^(.+?)\.s3([.-][\w-]+)?\.amazonaws\.com$""".r private def hostOpt(uri: URI): Option[String] = Option(uri.getHost).orElse(Option(uri.getAuthority)).map(_.trim).filter(_.nonEmpty) private def bucketFromVirtualHost(host: String): Option[String] = host match { case VirtualHostedPattern(bucket, _) => Some(bucket) case _ => None }This fixes failures on regional endpoints and ensures keys do not carry a leading slash, which is critical for correct List/Delete behavior. Based on learnings.
Also applies to: 27-39, 40-50
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
39-55: Don’t derive bucket/endpoint fromcredentialName; use the matched S3 credential (name/endpoint/region).Assuming
credentialNameis a logical identifier, parsing it as a URI yields wrong bucket and an invalid endpoint (e.g.,https://bucket). Derive from the selectedS3AccessKeyCredentialinstead; use its optionalendpoint/region, and parsenameonly if it is actually a URI.Apply this refactor:
@@ - lazy val s3UploadBucketOpt: Option[String] = - // by convention, the credentialName is the S3 URI so we can extract the bucket from it. - S3UriUtils.hostBucketFromUri(new URI(dataStoreConfig.Datastore.S3Upload.credentialName)) + private lazy val s3UploadCredentialOpt: Option[S3AccessKeyCredential] = + dataStoreConfig.Datastore.DataVaults.credentials + .flatMap(new CredentialConfigReader(_).getCredential) + .collectFirst { + case c @ S3AccessKeyCredential(name, _, _, _, _) + if dataStoreConfig.Datastore.S3Upload.credentialName == name => c + } + + private lazy val s3UploadBaseUriOpt: Option[URI] = + s3UploadCredentialOpt.flatMap { c => + // name might be a UPath/URI root like s3://bucket/prefix or https://minio:9000/bucket + tryo(UPath.fromString(c.name).toRemoteUriUnsafe).toOption + .orElse(tryo(new URI(c.name)).toOption) + } + + lazy val s3UploadBucketOpt: Option[String] = + s3UploadBaseUriOpt.flatMap(S3UriUtils.hostBucketFromUri) @@ - private lazy val s3UploadEndpoint: URI = { - // by convention, the credentialName is the S3 URI so we can extract the bucket from it. - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpointOpt: Option[URI] = + s3UploadCredentialOpt.flatMap(_.endpoint.map(URI.create))This prevents constructing invalid endpoints and decouples behavior from a brittle config assumption. Based on learnings.
If `credentialName` is indeed a URI in your deployments, please confirm and we can simplify accordingly. </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (7)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)</summary><blockquote> `41-41`: **Extract hardcoded retry limit as a named constant.** The retry limit `15` is hardcoded. Consider extracting it as a private constant (e.g., `private val MaxMoveDedupRetries = 15`) for clarity and maintainability. ```diff + private val MaxMoveDedupRetries = 15 + private def deleteWithRetry(sourcePath: Path, targetPath: Path, retryCount: Int = 0)( implicit ec: ExecutionContext): Fox[Unit] = - if (retryCount > 15) { + if (retryCount > MaxMoveDedupRetries) { Fox.failure(s"Deleting dataset failed: too many retries.")app/models/dataset/Dataset.scala (2)
892-906: Consider checking bothrealpathandpathfor completenessThe method currently only checks
m.realpathto find datasets with mags in a directory. While this is correct for non-virtual datasets (filesystem paths), you might want to also checkm.pathfor safety, especially if virtual datasets could have paths under the same prefix.If needed, apply this diff to check both fields:
run(q""" SELECT d._id FROM webknossos.dataset_mags m JOIN webknossos.datasets d ON m._dataset = d._id - WHERE m.realpath IS NOT NULL - AND starts_with(m.realpath, $absolutePathWithTrailingSlash) + WHERE ( + (m.realpath IS NOT NULL AND starts_with(m.realpath, $absolutePathWithTrailingSlash)) + OR starts_with(m.path, $absolutePathWithTrailingSlash) + ) AND d._id != $ignoredDataset AND d._datastore = ${dataStore.name.trim} """.as[ObjectId])
1293-1306: Add NULL check for consistency and safetySimilar to the mags version, this method should filter out NULL paths to avoid potential issues with the string prefix check.
Apply this diff to add a NULL check:
run(q""" SELECT d._id FROM webknossos.dataset_layer_attachments a JOIN webknossos.datasets d ON a._dataset = d._id - WHERE starts_with(a.path, $absolutePathWithTrailingSlash) + WHERE a.path IS NOT NULL + AND starts_with(a.path, $absolutePathWithTrailingSlash) AND d._id != $ignoredDataset AND d._datastore = ${dataStore.name.trim} """.as[ObjectId])webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
1-7: Add tests for core URI forms.Cover:
- s3://bucket/key
- https://bucket.s3.amazonaws.com/key
- https://bucket.s3.us-west-2.amazonaws.com/key
- https://s3.us-west-2.amazonaws.com/bucket/key
- http://minio.local:9000/bucket/key
Do you want me to scaffold a ScalaTest suite for this?
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (3)
79-86: Fail fast on invalid/non‑S3 paths; avoid aborting all buckets due to one bad path.Currently, grouping by
Option[String]may includeNoneand the first failure can abort the whole deletion.- def deletePaths(paths: Seq[UPath])(implicit ec: ExecutionContext): Fox[Unit] = { - val pathsByBucket: Map[Option[String], Seq[UPath]] = paths.groupBy(S3UriUtils.hostBucketFromUpath) - for { - _ <- Fox.serialCombined(pathsByBucket.keys) { bucket: Option[String] => - deleteS3PathsOnBucket(bucket, pathsByBucket(bucket)) - } - } yield () - } + def deletePaths(paths: Seq[UPath])(implicit ec: ExecutionContext): Fox[Unit] = { + val (nonS3, s3Paths) = paths.partition(p => !p.getScheme.contains(PathSchemes.schemeS3)) + if (nonS3.nonEmpty) return Fox.failure(s"Non‑S3 paths passed to ManagedS3Service: ${nonS3.mkString(", ")}") + + val grouped = s3Paths.groupBy(S3UriUtils.hostBucketFromUpath) + grouped.get(None).foreach { invalid => + // surface all invalid paths instead of a generic failure + return Fox.failure(s"Could not determine S3 bucket for: ${invalid.mkString(", ")}") + } + val byBucket: Map[String, Seq[UPath]] = grouped.collect { case (Some(b), ps) => (b, ps) } + Fox.serialCombined(byBucket.keys.toSeq)(b => deleteS3PathsOnBucket(Some(b), byBucket(b))).map(_ => ()) + }This keeps valid buckets progressing even if a caller accidentally passes a bad path and also returns actionable error text.
75-78:transferManagerBoxis unused.Remove or use it (e.g., for future multipart/copy ops) to avoid dead code.
155-158: Speed uppathIsInManagedS3and limit to S3 credentials.Avoid repeatedly parsing all credentials; precompute S3 roots and filter to S3 credentials only.
Example:
private lazy val managedS3Roots: Seq[UPath] = globalCredentials.collect { case S3AccessKeyCredential(name, _, _, _, _) if name.startsWith("s3://") => UPath.fromStringUnsafe(name).normalize } def pathIsInManagedS3(path: UPath): Boolean = path.getScheme.contains(PathSchemes.schemeS3) && managedS3Roots.exists(path.startsWith)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
app/controllers/WKRemoteDataStoreController.scala(2 hunks)app/models/dataset/Dataset.scala(3 hunks)app/models/dataset/DatasetService.scala(5 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/controllers/WKRemoteDataStoreController.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/Dataset.scala
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig(11-78)DataVaults(61-63)S3Upload(64-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
S3UriUtils(7-52)hostBucketFromUri(9-22)hostBucketFromUpath(24-28)objectKeyFromUri(40-49)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (8)
UPath(55-97)toRemoteUriUnsafe(122-122)toRemoteUriUnsafe(199-199)getScheme(120-120)getScheme(197-197)fromString(60-60)startsWith(139-143)startsWith(209-213)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader(8-64)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
collectFirst(614-644)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
combined(82-93)fromFuture(40-51)successful(51-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectory(116-120)app/models/dataset/WKRemoteDataStoreClient.scala (1)
deleteOnDisk(122-129)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deleteOnDisk(413-428)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
successful(51-54)failure(56-60)
app/models/dataset/Dataset.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)app/utils/sql/SqlInterpolation.scala (1)
q(20-39)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
flatten(269-279)toFox(12-12)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
toSingleBox(63-81)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (1)
util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
Full(661-699)
app/models/dataset/DatasetService.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
UPath(55-97)app/models/annotation/Annotation.scala (2)
findOne(289-296)countAllByDataset(644-656)app/models/storage/UsedStorageService.scala (2)
UsedStorageService(33-227)refreshStorageReportForDataset(211-227)app/models/dataset/Dataset.scala (6)
deleteDataset(709-737)findMagPathsUsedOnlyByThisDataset(871-892)findAttachmentPathsUsedOnlyByThisDataset(1278-1293)findDatasetsWithMagsInDir(892-906)findDatasetsWithAttachmentsInDir(1293-1306)findOne(203-210)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(79-86)app/models/dataset/WKRemoteDataStoreClient.scala (2)
deletePaths(134-141)deleteOnDisk(122-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (1)
app/models/dataset/Dataset.scala (1)
10-10: LGTM: Clean import simplificationThe shift from
DataSourceMagInfotoUPathaligns with the path-centric refactoring for dataset deletion.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/controllers/DatasetController.scala(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
app/controllers/DatasetController.scala (5)
app/telemetry/SlackNotificationService.scala (1)
SlackNotificationService(11-59)util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (2)
log(55-62)logTime(32-48)util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
toString(15-15)Instant(14-45)Instant(47-103)now(48-48)logSince(72-75)app/utils/WkConf.scala (1)
Features(135-155)app/models/dataset/DatasetService.scala (1)
deleteDataset(511-538)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (1)
app/controllers/DatasetController.scala (1)
38-38: Slack notifier wiring verified and correct.Import and DI of SlackNotificationService in DatasetController are confirmed. The service is properly defined, bound as an eager singleton in
WebknossosModule.scala, and consistently used across the application (SqlClient, TaskCreationService, DatasetService, JobController, AnnotationController, and others).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
428-436: Surface invalid/ignored paths to clients.If DataSourceService rejects paths (non‑managed S3 or outside baseDir), propagate a 400 with details instead of a generic failure/OK. This avoids silent skips. Consider returning per‑path results.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
439-449: Use ManagedS3Service bucket; centralize URI building.Switch to managedS3Service.s3UploadBucketOpt is correct. Suggest moving endpoint host/URI construction into ManagedS3Service (e.g., a helper returning UPath base) to avoid duplicating URI parsing here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
app/controllers/WKRemoteDataStoreController.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala(6 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
🧰 Additional context used
🧬 Code graph analysis (4)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
deleteDatasetFromDB(548-562)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox(28-228)Fox(230-303)serialCombined(93-97)serialCombined(97-109)toFox(12-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
isLocal(30-30)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
pathIsInManagedS3(155-159)deletePaths(79-86)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)deleteDirectoryRecursively(184-191)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deletePaths(428-437)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (8)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (4)
validateAccessFromTokenContext(87-94)UserAccessRequest(30-30)UserAccessRequest(31-72)webknossos(70-72)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
dataBaseDir(146-146)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
getDataSource(190-197)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk(14-36)app/models/dataset/WKRemoteDataStoreClient.scala (2)
deleteOnDisk(122-129)deletePaths(134-141)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
deletePaths(79-86)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson(204-209)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
deletePathsFromDiskOrManagedS3(300-311)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectoryBox(122-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (12)
app/controllers/WKRemoteDataStoreController.scala (3)
6-6: LGTM! Clean import removal.The
Fullimport is no longer needed in this controller since the Box pattern matching has moved toDatasetService.deleteDatasetFromDB.
38-51: Excellent refactoring – proper separation of concerns.Removing
annotationDAOfrom the controller is correct since the annotation-count logic now resides inDatasetService.deleteDatasetFromDB. This improves maintainability by centralizing deletion logic in the service layer.
202-212: Well-architected delegation to service layer.The refactoring correctly moves deletion orchestration to
DatasetService.deleteDatasetFromDB, which now handles dataset lookup, annotation counting, conditional deletion, and storage refresh. The idempotent behavior (returning success for non-existent datasets) aligns with HTTP DELETE semantics.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
27-32: DI addition looks good.
The ManagedS3Service injection integrates cleanly with existing services.
313-314: Tiny nit: helper is fine.
existsOnDisk is straightforward and used correctly in refresh path.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
90-94: baseDirAbsolute endpoint: OK.
Scoped to webknossos context; returning a path string is acceptable.
415-426: deleteOnDisk flow simplification: OK.
Fetching the data source first and delegating to service is cleaner; auth via webknossos context matches WK→DS usage.Confirm no external clients rely on the previous deleteDataset(datasetId) permission here.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
23-24: Imports updated for ManagedS3Service: OK.
100-103: Constructor DI: OK.
ManagedS3Service centralization improves cohesion and config handling.
146-147: PathUtils.ensureDirectoryBox usage: good.
Better error mapping vs direct ensureDirectory.
541-548: S3 TransferManager via service: good.
Clear failure if S3 not configured; checks failedTransfers. LGTM.
582-594: Failure cleanup argument order fix: OK.
deleteOnDisk now receives datasetId first; matches signature.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback :D
Testing results:
Delete non-virtual datasets, should run through if paths are unique.
Other datasets symlinking into this one should block the deletion
Looks good 👍
Data on S3 buckets not listed with global credentials should never be deleted.
Also seemed good
Delete virtual datasets, should run through. Other datasets referencing the data should still have it afterwards.
I am having trouble with setting up a scenario where a s3 mag is shared.
I enabled uploading to s3. Made a successful upload. Used the new s3 path + credentials for explore remote and this also worked. But the dataset is now not rendered 🤔.
The server console says:
backend: 2025-10-29 14:27:27,600 [error] com.scalableminds.webknossos.datastore.services.DSDatasetErrorLoggingService - Error while loading bucket for layer 2-2-1 at BucketPosition(voxelMag1 at (32, 0, 160), bucket at (1,0,5), mag(1, 1, 1), additional coordinates: ), cuboid: Cuboid((32, 0, 160) / (1, 1, 1),32,32,32) for <path removed by me>: Could not read header at zarr.json <~ Failed to read from vault path <~ The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: 1TR63AYS5G6NZF1Y, Extended Request ID: <truncated by myself>) (SDK Attempt Count: 1) Stack trace: software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: <truncated by myself>) (SDK Attempt Count: 1)
The rest of the testing went well
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Outdated
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala
Outdated
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
Show resolved
Hide resolved
|
Oh and I forgot to mention that not all S3 is cleared: The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
39-55: Remove@tailrecannotation.The
@tailrecannotation cannot be satisfied because the recursive call (line 52) is inside a try-catch block, preventing tail-call optimization. This was already flagged in a previous review.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
300-306: Lock down local deletions to the datastore base directory.Any caller can submit a
file://UPath pointing outside the datastore root, and we will happily recurse-delete that absolute path. From the new DELETE endpoint this lets a remote user wipe arbitrary directories (even/) on the datastore host. We must validate each local path againstdataBaseDir, reject anything else, and only then delete either the directory tree or single files. Also fail fast on non-managed remote paths instead of silently skipping them.Please apply a guard like the following:
def deletePathsFromDiskOrManagedS3(paths: Seq[UPath]): Fox[Unit] = { - val localPaths = paths.filter(_.isLocal).flatMap(_.toLocalPath) - val managedS3Paths = paths.filter(managedS3Service.pathIsInManagedS3) - for { - _ <- Fox.serialCombined(localPaths)(PathUtils.deleteDirectoryRecursively(_).toFox) - _ <- managedS3Service.deletePaths(managedS3Paths) - } yield () + val (localCandidates, remoteCandidates) = paths.partition(_.isLocal) + val managedS3Paths = remoteCandidates.filter(managedS3Service.pathIsInManagedS3) + val disallowedRemote = remoteCandidates.diff(managedS3Paths) + val baseDir = dataBaseDir.toAbsolutePath.normalize + + for { + _ <- Fox.fromBool(disallowedRemote.isEmpty) ?~> + s"Refusing to delete non-managed remote paths: ${disallowedRemote.mkString(", ")}" + validatedLocal <- Fox.serialCombined(localCandidates) { upath => + for { + local <- upath.toLocalPath.toFox + abs = local.toAbsolutePath.normalize + _ <- Fox.fromBool(abs.startsWith(baseDir)) ?~> + s"Refusing to delete outside datastore base directory ($baseDir): $abs" + } yield abs + } + _ <- Fox.serialCombined(validatedLocal) { abs => + if (Files.isDirectory(abs)) PathUtils.deleteDirectoryRecursively(abs).toFox + else tryo(Files.deleteIfExists(abs)).map(_ => ()).toFox + } + _ <- managedS3Service.deletePaths(managedS3Paths) + } yield () }This matches the earlier review feedback but is still unresolved, so please fix before release.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (1)
39-72: Stop interpretingcredentialNameas an S3 URI.
Datastore.S3Upload.credentialNameis configured as a plain credential ID (e.g."managed-s3"). The new code assumes it is an S3 URI, immediately parsing it withnew URI(...)and forcing an endpoint override based on that host. On existing installations this throwsURISyntaxException, sos3ClientBoxbecomes a Failure and all managed S3 deletions stop working. Even if the parse succeeded, reinterpreting the name changes semantics and breaks compatibility.Please derive bucket/endpoint/region from the actual
S3AccessKeyCredentialmetadata (the optionalregionandendpointfields) and only override the client when those optional fields are populated. The field should remain a plain identifier. For example:- private lazy val s3UploadCredentialsOpt: Option[(String, String)] = - dataStoreConfig.Datastore.DataVaults.credentials.flatMap { credentialConfig => - new CredentialConfigReader(credentialConfig).getCredential - }.collectFirst { - case S3AccessKeyCredential(credentialName, accessKeyId, secretAccessKey, _, _) - if dataStoreConfig.Datastore.S3Upload.credentialName == credentialName => - (accessKeyId, secretAccessKey) - } - - lazy val s3UploadBucketOpt: Option[String] = - S3UriUtils.hostBucketFromUri(new URI(dataStoreConfig.Datastore.S3Upload.credentialName)) - - private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI("https", null, credentialUri.getHost, -1, null, null, null) - } + private lazy val s3UploadCredentialOpt: Option[S3AccessKeyCredential] = + dataStoreConfig.Datastore.DataVaults.credentials + .flatMap(new CredentialConfigReader(_).getCredential) + .collectFirst { + case cred @ S3AccessKeyCredential(name, _, _, _, _) + if dataStoreConfig.Datastore.S3Upload.credentialName == name => cred + } + + private lazy val s3UploadCredentialsOpt: Option[(String, String)] = + s3UploadCredentialOpt.map(cred => (cred.identifier, cred.secret)) + + private lazy val s3EndpointOverrideOpt: Option[URI] = + s3UploadCredentialOpt.flatMap(_.endpoint).flatMap(ep => tryo(new URI(ep)).toOption) + + private lazy val s3Region: Region = + s3UploadCredentialOpt.flatMap(_.region).map(Region.of).getOrElse(Region.US_EAST_1)And in the builder only call
endpointOverride/forcePathStylewhens3EndpointOverrideOptis defined:- .crossRegionAccessEnabled(true) - .forcePathStyle(true) - .endpointOverride(s3UploadEndpoint) - .region(Region.US_EAST_1) + .crossRegionAccessEnabled(true) + .region(s3Region) + s3EndpointOverrideOpt.foreach { endpoint => + builder.endpointOverride(endpoint) + builder.forcePathStyle(true) + }This keeps the existing configuration contract intact and avoids runtime failures.
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
47-47: Consider usingsinterpolator instead off.The
finterpolator is intended for formatted strings with format specifiers (like%d,%s). Without format specifiers,sis more appropriate and avoids unnecessary overhead.Apply this diff:
- if (retryCount == 0) targetPath else targetPath.resolveSibling(f"${targetPath.getFileName} ($retryCount)") + if (retryCount == 0) targetPath else targetPath.resolveSibling(s"${targetPath.getFileName} ($retryCount)")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-04-28T14:18:04.368Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, `Fox.fromBool()` expressions should use the `<-` binding operator instead of the `=` assignment operator to properly propagate error conditions. Using `=` will cause validation failures to be silently ignored.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-01-27T12:06:42.865Z
Learnt from: MichaelBuessemeyer
PR: scalableminds/webknossos#8352
File: app/models/organization/CreditTransactionService.scala:0-0
Timestamp: 2025-01-27T12:06:42.865Z
Learning: In Scala's for-comprehension with Fox (Future-like type), the `<-` operator ensures sequential execution. If any step fails, the entire chain short-circuits and returns early, preventing subsequent operations from executing. This makes it safe to perform validation checks before database operations.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
🧬 Code graph analysis (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)ensureDirectory(116-120)util/src/main/scala/com/scalableminds/util/tools/Box.scala (1)
Full(661-699)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deleteOnDisk(413-428)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
successful(51-54)failure(56-60)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox(28-228)Fox(230-303)serialCombined(93-97)serialCombined(97-109)toFox(12-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (3)
isLocal(29-29)toLocalPath(111-111)toLocalPath(176-176)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (2)
pathIsInManagedS3(153-157)deletePaths(79-86)util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)
PathUtils(14-221)deleteDirectoryRecursively(184-191)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
deletePaths(428-437)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
dataBaseDir(146-146)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/ManagedS3Service.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig(11-75)DataVaults(58-60)S3Upload(61-65)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
PathSchemes(3-9)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/S3UriUtils.scala (4)
S3UriUtils(7-52)hostBucketFromUri(9-22)hostBucketFromUpath(24-28)objectKeyFromUri(40-49)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (6)
UPath(54-96)toRemoteUriUnsafe(121-121)toRemoteUriUnsafe(188-188)fromString(59-59)startsWith(138-142)startsWith(198-205)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader(8-64)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (4)
toFox(12-12)combined(82-93)fromFuture(40-51)successful(51-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)
2-11: LGTM! Clean refactor of trait dependencies.The trait signature and imports have been simplified appropriately. Removing
DataSourceToDiskWriterand addingPathUtilsaligns well with the new trash-based deletion approach.
24-36: LGTM! Idempotent deletion with clear logging.The existence check and error handling are correct. Returning
Fox.successful(())when the path doesn't exist makes the deletion idempotent, which is the right behavior. The logging clearly distinguishes between actual deletion and no-op cases.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Show resolved
Hide resolved
I would have expected that cumsum.json would not have been uploaded to s3 in the first place, as it was not referenced in the datasource-properties.json 🤔 But yes, I think it’s acceptable for the moment that it is not deleted. It is also not very large. |
Ok than no need for me to do this again as this takes some time? |
|
Oh I see, the error states
This means that the file does not exist, which is alway what the error says:
Therefore, I think the root cause for this is that the exploration thinks it found a valid layer, but this was actually a mag. Thus, the datasource stored in the db is buggy and thus loading the dataset does not work properly. |
|
But this exploration bug is very likely not related to this pr. So I'd say we can make this green as I did some little retesting showing that it still works as well as your latest changes looking good? |
|
Yes, I agree :) Also, no need to re-test the linked mags in virtual datasets, I did that today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as this bug is unrelated, I opened #9034.
And as the code is good and doing what it should be doing 🟢
…r upload (#9047) We can have multiple managed s3 stores. Only one is selected for upload but dataset deletion should work for all of them. This PR refactors the ManagedS3Service a bit and selects the correct client with endpoint/credential for each path. ### URL of deployed dev instance (used for testing): - https://___.webknossos.xyz ### Steps to test: (I already tested locally, no need to re-test) - In application.conf, insert multiple dataVaults.credentials for s3 endpoints. Set s3Upload.credentialName to the first of them - Upload a small dataset - set the s3Upload.credentialName to another of them - make sure both datasets can be loaded - delete both datasets from their respective settings views. - backend logging should show they were deleted from their respective s3 buckets ### Issues: - follow-up for #8924 - fixes #9043 ------ - [x] Added changelog entry (create a `$PR_NUMBER.md` file in `unreleased_changes` or use `./tools/create-changelog-entry.py`) - [x] Removed dev-only changes like prints and application.conf edits - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) - [x] Needs datastore update after deployment
- When scanning datasources from directories, always add resolved mag paths immediately - Introduces realpaths for attachments - Report all realpaths that could be determined, scale down logging for those that couldn’t. ### URL of deployed dev instance (used for testing): - https://___.webknossos.xyz ### Steps to test: - start wk with some datasets in the binaryData dir, watch backend logging, should show realpath scan failures (provoke some by deleting mags or attachments that are referenced in some datasource-properties.jsons) - All realpaths that could be determined should be added to the database ### TODOs: - [x] immediate paths for datasources - [x] simplify layerpath logic - [x] attachment realpath column - [x] don’t fail for datasource on single realpath failure - [x] log failures - [x] scan also attachments - [x] evolution - [x] adapt s3-delete code to also consider realpaths of attachments ### Issues: - blocked by #8924 (paths routes changed) - blocked by #8980 (datavault path refactoring) - fixes #9018 ------ - [x] Added changelog entry (create a `$PR_NUMBER.md` file in `unreleased_changes` or use `./tools/create-changelog-entry.py`) - [x] Removed dev-only changes like prints and application.conf edits - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) - [x] Needs datastore update after deployment --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: MichaelBuessemeyer <[email protected]>


Steps to test:
TODOs:
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)