Use count-based progress for Docker image pulls#6371
Closed
Conversation
c70d3b8 to
ffc3783
Compare
3b608f3 to
8350a24
Compare
Refactor Docker image pull progress to use a simpler count-based approach where each layer contributes equally (100% / total_layers) regardless of size. This replaces the previous size-weighted calculation that was susceptible to progress regression. The core issue was that Docker rate-limits concurrent downloads (~3 at a time) and reports layer sizes only when downloading starts. With size- weighted progress, large layers appearing late would cause progress to drop dramatically (e.g., 59% -> 29%) as the total size increased. The new approach: - Each layer contributes equally to overall progress - Per-layer progress: 70% download weight, 30% extraction weight - Progress only starts after first "Downloading" event (when layer count is known) - Always caps at 99% - job completion handles final 100% This simplifies the code by moving progress tracking to a dedicated module (pull_progress.py) and removing complex size-based scaling logic that tried to account for unknown layer sizes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Layers that already exist locally should not count towards download progress since there's nothing to download for them. Only layers that need pulling are included in the progress calculation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
bb9145c to
87e1e7a
Compare
14 tasks
Member
Author
|
With #6379 landed this is obsolete. Closing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed change
Refactor Docker image pull progress to use a simpler count-based approach where each layer contributes equally (100% / total_layers) regardless of size.
The core issue was that Docker rate-limits concurrent downloads (by default 3 at a time, see Docker DefaultMaxConcurrentDownloads constant) and reports layer sizes only when downloading starts. With the previous size-weighted progress calculation, large layers appearing late would cause progress to drop dramatically (e.g., 59% -> 29%) as the total size increased. We prevented the progress from going backwards, but in practice that meant the progress would stale for an extended amount of time.
The new approach:
This moves progress tracking to a dedicated module (
pull_progress.py) and removes the complex size-based scaling logic that tried to account for unknown layer sizes. With this, there is always some progress. Unfortunately, it also means that progress slows down towards the completion, since the larger layers are then still downloaded/extracted. But this is the best we can do currently.Type of change
Additional information
Checklist
ruff format supervisor tests)If API endpoints or add-on configuration are added/changed: