Skip to content

Conversation

@vulyon
Copy link

@vulyon vulyon commented Nov 25, 2025

Calculate task ID based on blob SHA256 for registry downloads

Description

This PR implements a feature to calculate task IDs based on the blob's SHA256 digest for OCI registry blob downloads, rather than using the full download URL. This allows identical blobs from different registry domains to share the same task ID, avoiding redundant downloads and storage.

Key Changes:

  1. Configuration Addition (dragonfly-client-config/src/dfdaemon.rs):

    • Added enable_blob_based_task_id field to RegistryMirror configuration
    • Defaults to true as specified in the issue
    • Allows users to control whether this feature is enabled
  2. Core Implementation (dragonfly-client/src/proxy/mod.rs):

    • Added extract_blob_digest_from_url() function to detect and extract SHA256 digest from registry blob URLs (format: /v2/<name>/blobs/sha256:<digest>)
    • Modified make_download_task_request() to use blob digest for task ID calculation when the feature is enabled
    • Implements a priority system for task ID calculation:
      1. Explicit X-Dragonfly-Content-For-Calculating-Task-ID header (highest priority)
      2. Blob SHA256 digest from URL (if enable_blob_based_task_id is true)
      3. URL-based calculation (default fallback)
  3. Testing:

    • Added unit tests for extract_blob_digest_from_url() function
    • Tests cover valid blob URLs and invalid/edge cases

Example:

Before this change:

  • https://registry1.io/v2/nginx/blobs/sha256:abc123... → Task ID 1
  • https://registry2.io/v2/nginx/blobs/sha256:abc123... → Task ID 2 (duplicate download)

After this change:

  • Both URLs → Same Task ID (based on sha256:abc123...)
  • No duplicate downloads, shared storage

Related Issue

Closes dragonflyoss/dragonfly#4418

Motivation and Context

When pulling container images, different registry mirrors (e.g., Docker Hub, Harbor, custom registries) may serve the same image blobs with identical content but different URLs. Currently, Dragonfly computes task IDs based on the download URL, which leads to:

  • Multiple downloads of the same blob from different registries
  • Data redundancy in storage
  • Wasted bandwidth and slower image pulls

This change solves the problem by:

  • Using the blob's SHA256 digest (which is content-addressable) as the basis for task ID calculation
  • Ensuring identical blobs share the same task ID regardless of registry domain
  • Reducing storage usage and improving download efficiency

The feature is enabled by default but can be disabled via configuration if needed for compatibility reasons.

Screenshots (if appropriate)

N/A - Backend functionality only

@vulyon vulyon changed the title fix: task ID of image blobs fix: task ID of image blobs for dragonfly's issue #4418 Nov 25, 2025
@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 50.84746% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.09%. Comparing base (118416e) to head (a25e1dc).

Files with missing lines Patch % Lines
dragonfly-client/src/grpc/dfdaemon_download.rs 0.00% 23 Missing ⚠️
dragonfly-client/src/grpc/dfdaemon_upload.rs 0.00% 23 Missing ⚠️
dragonfly-client/src/proxy/mod.rs 0.00% 12 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1495      +/-   ##
==========================================
- Coverage   48.11%   48.09%   -0.02%     
==========================================
  Files          70       70              
  Lines       17541    17658     +117     
==========================================
+ Hits         8439     8492      +53     
- Misses       9102     9166      +64     
Files with missing lines Coverage Δ
dragonfly-client-config/src/dfdaemon.rs 92.51% <100.00%> (+0.03%) ⬆️
dragonfly-client-util/src/id_generator/mod.rs 98.56% <100.00%> (+0.27%) ⬆️
dragonfly-client/src/proxy/mod.rs 0.00% <0.00%> (ø)
dragonfly-client/src/grpc/dfdaemon_download.rs 3.28% <0.00%> (-0.07%) ⬇️
dragonfly-client/src/grpc/dfdaemon_upload.rs 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gaius-qi
Copy link
Member

@vulyon Please fix lint.

@vulyon vulyon force-pushed the fix-issue4418 branch 2 times, most recently from 0f870bc to 0a0233c Compare November 25, 2025 14:46
@vulyon
Copy link
Author

vulyon commented Nov 25, 2025

@vulyon Please fix lint.

I‘ve fixed it.

@gaius-qi gaius-qi added the enhancement New feature or request label Nov 26, 2025
@gaius-qi gaius-qi added this to the v2.4.0 milestone Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

For the download URL of image blobs, directly calculate the task ID based on the blob's sha256

2 participants