Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore: rework progress aggregation #4225

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Michal-Leszczynski
Copy link
Collaborator

This PR does the following:

Note, the RestoredBytes and RestoreDuration need to be added to the sctool progress display as a part of a separate PR.

This column might be interesting for backup which performs
deduplication, but it does not make any sense for restore.
It wasn't even a part of restore progress API, so it's safe
to remove it.
This column wasn't needed with the sync load&stream Scylla API
(restored is equal to either 0 or downloaded), but it is cleaner
to have it and it can be useful for native restore Scylla API.
Followup to the e112747.
In the past we treated RestoredStartedAt/RestoreCompletedAt
as the time frame for l&s, but we should look at it more
inclusively, so as the time frame of restoring bytes,
which for Rclone API + l&s approach starts with the download.
Previously, the code aggregating restore progress and the code
querying the run progress for aggregation were tightly tangled.
This resulted in difficulties in writing unit tests and a more
confusing code overall.
This commit changes progress aggregation to use iterator over
run progresses, which can be easily mocked in unit tests.
It is important to specify now when testing.
Thanks to the previous commit, it is now possible
to add unit tests for progress aggregation!
Copy link
Collaborator

@VAveryanov8 VAveryanov8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice usage of iterators and tests!

@Michal-Leszczynski
Copy link
Collaborator Author

Thanks @VAveryanov8!
I'm just wondering, which approach to iterator that can encounter an error do you prefer?

  1. The one from this PR - iterator stores the error internally and it needs to be checked manually after iteration (similar to gocql iterator). It allows to hide the error handling from other parts of the code (might be good or bad).
  2. Instead of iter.Seq[K], use iter.Seq2[K, error] which allows for handling errors by the underlying code.

I think that the second is probably better in general, but the first approach seamed more tailored to this PR use case.

@VAveryanov8
Copy link
Collaborator

The one from this PR - iterator stores the error internally and it needs to be checked manually after iteration (similar to gocql iterator). It allows to hide the error handling from other parts of the code (might be good or bad).
Instead of iter.Seq[K], use iter.Seq2[K, error] which allows for handling errors by the underlying code.
I think that the second is probably better in general, but the first approach seamed more tailored to this PR use case.

Good question! I don't have a strong opinion on this - iterators are quite new to the language and there are not that many projects that already using them.
In general second approach is more flexible (err can be skipped and etc), but in this particular case I think what you did looks better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants