Fix ADLS Gen2 plugin to support Azure Blob Soft Delete#17806
Fix ADLS Gen2 plugin to support Azure Blob Soft Delete#17806
Conversation
The DFS (Data Lake) API endpoint does not support Azure Blob Soft Delete, causing 409 EndpointUnsupportedAccountFeatures errors when uploading segments to storage accounts with org-mandated Soft Delete policies. This change adds a BlobContainerClient alongside the existing DataLakeFileSystemClient, using the Blob API (*.blob.core.windows.net) for file uploads which is fully compatible with Soft Delete. The DFS API is retained for read/metadata operations (list, exists, open, etc.) and as a write fallback when the BlobContainerClient is not initialized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds Azure Blob API uploads to the ADLS Gen2 PinotFS implementation to support storage accounts with Blob Soft Delete enabled, while retaining DFS for reads/metadata and as a fallback.
Changes:
- Introduces
BlobContainerClientinitialization (blob endpoint) alongside the existing DFS client. - Routes uploads through Blob API with DFS upload as a fallback when Blob client isn’t available.
- Adds unit tests covering Blob upload success and Blob upload failure paths.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java | Initializes Blob SDK client and uses it for uploads; retains DFS upload fallback. |
| pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/ADLSGen2PinotFSTest.java | Adds tests validating Blob upload path and exception wrapping. |
| pinot-plugins/pinot-file-system/pinot-adls/pom.xml | Adds azure-storage-blob dependency to support Blob API uploads. |
Comments suppressed due to low confidence (1)
pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java:1
- In the DEFAULT auth case,
defaultAzureCredentialBuilder.build()is invoked twice, creating two separate credential instances. Build once into a local variable (e.g.,TokenCredential credential = defaultAzureCredentialBuilder.build()) and pass the same instance to both builders to avoid duplicate initialization work and potential differences in internal caching behavior.
/**
| BlobClient blobClient = _blobContainerClient.getBlobClient(path); | ||
| blobClient.upload(inputStream, contentLength, true); | ||
|
|
||
| if (contentMd5 != null) { | ||
| blobClient.setHttpHeaders(new BlobHttpHeaders().setContentMd5(contentMd5)); | ||
| } |
There was a problem hiding this comment.
This performs a second network call (setHttpHeaders) after upload to set Content-MD5. Consider setting headers as part of the initial upload request (using an upload options overload that accepts BlobHttpHeaders) so the upload + properties update is done in one request, avoids extra latency, and prevents ending up with an uploaded blob missing MD5 if setHttpHeaders fails.
| if (contentMd5 != null) { | ||
| blobClient.setHttpHeaders(new BlobHttpHeaders().setContentMd5(contentMd5)); | ||
| } |
There was a problem hiding this comment.
There’s a new branch that updates Blob headers when contentMd5 != null, but the added tests only cover the upload(...) call. Add a unit test that enables checksum/MD5 generation (so contentMd5 is non-null) and verifies the Blob path calls setHttpHeaders(...) with the expected MD5.
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
The DFS (Data Lake) API endpoint does not support Azure Blob Soft Delete, causing 409 EndpointUnsupportedAccountFeatures errors when uploading segments to storage accounts with org-mandated Soft Delete policies.
This change adds a BlobContainerClient alongside the existing DataLakeFileSystemClient, using the Blob API (*.blob.core.windows.net) for file uploads which is fully compatible with Soft Delete. The DFS API is retained for read/metadata operations (list, exists, open, etc.) and as a write fallback when the BlobContainerClient is not initialized.