Skip to content

Improved progress reporting for Xet uploads #3096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

hoytak
Copy link

@hoytak hoytak commented May 20, 2025

This PR adds detailed progress reporting for upload_files when hf_xet is used, showing both per-file progress and accurate total progress. Total progress speed, which includes both deduplication and data transfer, is also separated out into separate bars.

Requries xet-core / hf_xet at commit 4faec0b or later, which can be obtained using pip install --pre "hf-xet==1.1.3dev0"

hoytak added 2 commits May 20, 2025 11:37
This PR adds detailed progress reporting for upload_files when hf_xet
is used, showing both per-file progress and accurate total progress.
Total progress speed, which includes both deduplication and data
transfer, is also separated out into separate bars.

Requries xet-core / hf_xet at commit 4faec0b or later.
@hoytak
Copy link
Author

hoytak commented May 20, 2025

Here is a video of the current version of it in action:

Screen.Recording.2025-05-20.at.12.50.49.PM.mov

@rajatarya rajatarya requested review from hanouticelina and removed request for hanouticelina May 20, 2025 22:11
@rajatarya
Copy link
Contributor

@hanouticelina @Wauplin : @hoytak has worked on improving the UX with xet uploads, and this draft PR shows how more accurate reporting can be achieved on the upload_folder path.

In this he replaced tqdm with rich since it gave him more flexibility.

The main motivation for changing the UX with xet is due to the additional step of processing / chunking files. As you have seen, folks have been confused by seeing only file-level progress, and the current UX has 'choppy' upload behavior (the progress bar looks stalled and then jumps). By having that tracked explicitly with this new implementation it should be a lot clearer to users when their overall transfer is complete (TOTAL line), and the processing step ().

Before embarking on refactoring the download path with similar UX changes Hoyt wanted to get your feedback / approval with this direction of changes. He's also open to UX changes - naming - colors - whatever with this approach.

cc: @bpronan

Also, see Slack thread here: https://huggingface.slack.com/archives/C02V5EA0A95/p1747771966873299

setup.py Outdated
@@ -19,6 +19,7 @@ def get_version() -> str:
"requests",
"tqdm>=4.42.1",
"typing-extensions>=3.7.4.3", # to be able to import TypeAlias
"rich>=14.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in slack (private link) I'm quite reluctant to add rich as a required dependency 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants