Skip to content

Fix ADLS Gen2 plugin to support Azure Blob Soft Delete#17806

Open
xiangfu0 wants to merge 1 commit intomasterfrom
fix/adls-blob-soft-delete-support
Open

Fix ADLS Gen2 plugin to support Azure Blob Soft Delete#17806
xiangfu0 wants to merge 1 commit intomasterfrom
fix/adls-blob-soft-delete-support

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Mar 3, 2026

The DFS (Data Lake) API endpoint does not support Azure Blob Soft Delete, causing 409 EndpointUnsupportedAccountFeatures errors when uploading segments to storage accounts with org-mandated Soft Delete policies.

This change adds a BlobContainerClient alongside the existing DataLakeFileSystemClient, using the Blob API (*.blob.core.windows.net) for file uploads which is fully compatible with Soft Delete. The DFS API is retained for read/metadata operations (list, exists, open, etc.) and as a write fallback when the BlobContainerClient is not initialized.

The DFS (Data Lake) API endpoint does not support Azure Blob Soft Delete,
causing 409 EndpointUnsupportedAccountFeatures errors when uploading
segments to storage accounts with org-mandated Soft Delete policies.

This change adds a BlobContainerClient alongside the existing
DataLakeFileSystemClient, using the Blob API (*.blob.core.windows.net)
for file uploads which is fully compatible with Soft Delete. The DFS API
is retained for read/metadata operations (list, exists, open, etc.) and
as a write fallback when the BlobContainerClient is not initialized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xiangfu0 xiangfu0 requested a review from Copilot March 3, 2026 18:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Azure Blob API uploads to the ADLS Gen2 PinotFS implementation to support storage accounts with Blob Soft Delete enabled, while retaining DFS for reads/metadata and as a fallback.

Changes:

  • Introduces BlobContainerClient initialization (blob endpoint) alongside the existing DFS client.
  • Routes uploads through Blob API with DFS upload as a fallback when Blob client isn’t available.
  • Adds unit tests covering Blob upload success and Blob upload failure paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java Initializes Blob SDK client and uses it for uploads; retains DFS upload fallback.
pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/ADLSGen2PinotFSTest.java Adds tests validating Blob upload path and exception wrapping.
pinot-plugins/pinot-file-system/pinot-adls/pom.xml Adds azure-storage-blob dependency to support Blob API uploads.
Comments suppressed due to low confidence (1)

pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java:1

  • In the DEFAULT auth case, defaultAzureCredentialBuilder.build() is invoked twice, creating two separate credential instances. Build once into a local variable (e.g., TokenCredential credential = defaultAzureCredentialBuilder.build()) and pass the same instance to both builders to avoid duplicate initialization work and potential differences in internal caching behavior.
/**

Comment on lines +698 to +703
BlobClient blobClient = _blobContainerClient.getBlobClient(path);
blobClient.upload(inputStream, contentLength, true);

if (contentMd5 != null) {
blobClient.setHttpHeaders(new BlobHttpHeaders().setContentMd5(contentMd5));
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This performs a second network call (setHttpHeaders) after upload to set Content-MD5. Consider setting headers as part of the initial upload request (using an upload options overload that accepts BlobHttpHeaders) so the upload + properties update is done in one request, avoids extra latency, and prevents ending up with an uploaded blob missing MD5 if setHttpHeaders fails.

Copilot uses AI. Check for mistakes.
Comment on lines +701 to +703
if (contentMd5 != null) {
blobClient.setHttpHeaders(new BlobHttpHeaders().setContentMd5(contentMd5));
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a new branch that updates Blob headers when contentMd5 != null, but the added tests only cover the upload(...) call. Add a unit test that enables checksum/MD5 generation (so contentMd5 is non-null) and verifies the Blob path calls setHttpHeaders(...) with the expected MD5.

Copilot generated this review using guidance from repository custom instructions.
@codecov-commenter
Copy link

codecov-commenter commented Mar 3, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
11942 1 11941 73
View the top 1 failed test(s) by shortest run time
org.apache.pinot.plugin.filesystem.test.ADLSGen2PinotFSTest::tearDown
Stack Traces | 0.01s run time

No interactions wanted here:
-> at org.apache.pinot.plugin.filesystem.test.ADLSGen2PinotFSTest.tearDown(ADLSGen2PinotFSTest.java:114)
But found this interaction on mock '_mockBlobClient':
-> at org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS.copyInputStreamToDstViaBlob(ADLSGen2PinotFS.java:702)
***
For your reference, here is the list of all invocations ([?] - means unverified).
1. -> at org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS.copyInputStreamToDstViaBlob(ADLSGen2PinotFS.java:699)
2. [?]-> at org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS.copyInputStreamToDstViaBlob(ADLSGen2PinotFS.java:702)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants