snapshot: Transfer files concurrently to speed up snapshot transfer #1
+136
−47
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently snapshot transfer happens by sending GetFileRequest for every file known to be in the remote snapshot. This happens sequentially for each file.
The only real configurations which allow tuning the throughput of this transfer are the throttle which can be set when initializing the braft::Node, or the runtime configuration raft_max_byte_count_per_rpc which determines how many chunks a large file will be broken into during the transfer. The default is 128KiB, so a 1MiB file will be transfered in about 8 GetFileRequests.
This works great for snapshots which have a handful of large files. But if a snapshot has hundreds or thousands of small files then transfer of these snapshots can be pretty slow.
I locally create a snapshot with 100k files on my development machine, for example, it can take up to 30 minutes to transfer all of those files in that snapshot. Even though the latency per transfer is low, there is a full round trip plus a flush of __raft_meta on the receiving end for each file.
This patch adds concurrency to these transfers. When a remote snapshot is transferred locally, up to raft_max_get_file_request_concurrency GetFileRequests will be sent concurrently. This defaults to 64.
With this patch, the 100k file snapshot consistently transfers in under 10 seconds on my development machine.
This should resolve baidu#362.