Azure Blob Storage CommitBlockList API is not idempotent. #23794
Labels
Client
This issue points to a problem in the data-plane of the library.
customer-reported
Issues that are reported by GitHub users external to the Azure organization.
needs-team-attention
Workflow: This issue needs attention from Azure service team or SDK team
question
The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Service Attention
Workflow: This issue is responsible by Azure service team.
Storage
Storage Service (Queues, Blobs, Files)
Bug Report
Azure Blob Storage (
azblob
)'s API:CommitBlockList
is not idempotent.REST API Reference: https://learn.microsoft.com/en-us/rest/api/storageservices/put-block-list?tabs=microsoft-entra-id
The above documented API reference does not state that this is an idempotent API,
but I think it should be.
If it is intended to be, please consider the scenario mentioned below.
If this is not a bug, please do recommend the suggested/correct usage.
What happened?
Storage Account Config:
Consider the following scenario:
StageBlock
CommitBlockList
In very rare scenarios, we've noticed this creates 2 versions for the blob.
What did you expect or want to happen?
We expect only 1 version because we've called it only once.
Analysis
We noticed that the 2 created versions' timestamps were 3 seconds.
Which is same as the retry policy's first retry we provide the azblob client.
Which means this most probably happens in the following scenario:
OR
Azure servers send the response but it doesn't reach us
Generally, to avoid such scenarios, servers can ask for a requestID,
and then not do anything if that request was completed, resulting
in a no op.
The API Ref: https://learn.microsoft.com/en-us/rest/api/storageservices/put-block-list?tabs=microsoft-entra-id
mentions such an id
x-ms-client-request-id
but it seems it's being usedonly for metrics.
This request ID is generated uniquely for each request (not retry)
using
NewUniqueRequestIDPolicyFactory
Even if we had the same requestID, azure ends up creating a new version.
There's no way for client to to avoid this scenario, unless we add a blob existence check
before the retry somehow. (Which will be very tedious, but i think we can do that by using
the provided pipeline).
How to reproduce
Since simulating
will be very hard, you can just manually call the request twice, with same requestID.
Thanks!
The text was updated successfully, but these errors were encountered: