-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate large files download using concurent chunks for lakectl download recursive #8613
Conversation
…kectl fs download recursive The curent `lakectl fs download` supports downloading large files by splitting the files and downloading the content in parallel. This change add the same support to the recursive option with the support to run multiple downloads (of parallel chunks) in parallel. TODO New downloader do not uses sync infrastructure, currently support the same flags: use-presign and parallel downloads. Need to look into progress bar with chunks support. Support no-progress flag as sync does.
if !recursive { | ||
src := uri.URI{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description says that we are modifying download with the recursive option. However there appear to be significant changes to the !recursive code branch. What is the nature of these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src
was a copy of the remote
- I've removed it
esti/lakectl_test.go
Outdated
@@ -644,10 +642,6 @@ func TestLakectlFsDownload(t *testing.T) { | |||
require.Contains(t, sanitizedResult, "download ro/ro_1k.2") | |||
require.Contains(t, sanitizedResult, "download ro/ro_1k.3") | |||
require.Contains(t, sanitizedResult, "download ro/ro_1k.4") | |||
require.Contains(t, sanitizedResult, "Download Summary:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the user still want to see these summaries? Shouldn't they be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The history - the fs download recursive was replaced with Sync functionality used by lakectl local. This was part of the output done by the common code used by sync. It was also reporting Uploaded and Removed - which is not relevant to download command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the local sync summary to the recursive download
@@ -690,7 +680,6 @@ func TestLakectlFsUpload(t *testing.T) { | |||
t.Run("single_file_with_recursive", func(t *testing.T) { | |||
vars["FILE_PATH"] = "data/ro/ro_1k.0" | |||
RunCmdAndVerifySuccessWithFile(t, Lakectl()+" fs upload --recursive -s files/ro_1k lakefs://"+repoName+"/"+mainBranch+"/"+vars["FILE_PATH"]+" -s files/ro_1k", false, "lakectl_fs_upload", vars) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ultimately a performance improvement issue. Therefore some benchmarks should be taken to verify not just correctness but comparative performance between new/old implementation in various cases i.e. few small files, lots of small files, few large files, lots of large files...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The download functionality in chunks already implemented as part of single file download
- The performance improvement is done for large files only, we don't trigger chunks for smaller files. The existing download object is used for non-large files.
- The code parallel the downloads as sync used to parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool
The current
lakectl fs download
supports downloading large files by splitting the files and downloading the content in parallel.This change add the same support to the recursive option with the support to run multiple downloads (of parallel chunks) in parallel.
Close #8383