-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the concurrent/remote story? #185
Comments
Quick relevant link: https://github.com/dpc/rdedup/wiki/Rust's-fearless-concurrency-in-rdedup IIRC, the whole backend storage is protected by a sort of a read-write lock, and most operations (in particular adding new data) takes a shared lock Line 822 in 8f9c767
Notably, removing data (mostly garbage collection) takes an exclusive lock. |
A backend is anything that can implement these basic interfaces Line 32 in 8f9c767
|
@dpc Thanks for pointing that out, i already wondered about that. So starting a GC blocks everything else until done? Or finer granularity? Is the case that one backup writes a chunk and another one GC's it, covered? (source pointer appreciated ;-) What if the repo is remote (iirc on S3-like remotes writes are not instantly so locking may not even possible)? DO you know the duplicacy 2-step process? |
I think GC right now will block everything. Backend is irrelevant. From main logic perspective backends are only writing and loading requested files (kind of). However GC can be stopped at any time without losing progress and then resumed, so I could imagine if long GC is a problem it could be put behind a I skimmed at https://github.com/gilbertchen/duplicacy/wiki/Lock-Free-Deduplication#two-step-fossil-collection and The GC works by creating another "generation" folder, then stored-name by stored-name rewriting (moving) all the chunks to the new generation. After all names have been moved from the past generation to new generation, the leftover data chunks in the previous generations are deleted (after some reasonably long time has passed, to make sure any concurrent writers had time for dropbox/syncthing to sync) as they are clearly not referenced by anything. This should be fine as long as the renames are not very expensive (which is not always the case - eg. Backblaze B2 had no support for rename operation at the time). |
Hi,
First, than's for rdedup! This looks like a fantastic project to meet some real needs.
I would be happy to write some documentation on this if you can point me in the right direction.
I'm wondering two related questions.
First, is concurrent access to the repository allowed, and if so, in what ways? Can two processes write to it at once? Is any locking done? This is relevant for consolidating backups from multiple hosts to a single backup host.
Secondly, I see that cloud storage is WIP, which is fine. I'm wondering what exactly rdedup needs from its underlying filesystem, with an aim to evaluating whether it can run atop the various FUSE remotes; anything from sshfs to the S3-based ones, etc.
Thanks!
The text was updated successfully, but these errors were encountered: