-
-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basepath-hash inode algorithm #1323
base: latest-release
Are you sure you want to change the base?
Basepath-hash inode algorithm #1323
Conversation
src/fs_inode.hpp
Outdated
@@ -37,18 +37,23 @@ namespace fs | |||
const uint64_t fusepath_len, | |||
const mode_t mode, | |||
const dev_t dev, | |||
const ino_t ino); | |||
const ino_t ino, | |||
const std::string &basepath = ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the basepath be the first argument.
src/fs_inode.hpp
Outdated
@@ -37,18 +37,23 @@ namespace fs | |||
const uint64_t fusepath_len, | |||
const mode_t mode, | |||
const dev_t dev, | |||
const ino_t ino); | |||
const ino_t ino, | |||
const std::string &basepath = ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the default value really necessary? This would create a temp object if it was called with the default which isn't great.
623bc71
to
7ceac03
Compare
BTW... once we are happy with this please squash all these commits |
Sorry about so many typos in such a small amount of text. I even checked it with a spell-checker 1st. As far as getting rid of the default, I started on it, but I ran into a couple of snags. To get the base branch-dir everywhere that Edit: Are the instructions on running the tests somewhere? I tried |
It isn't right :) Need to ensure that the data required for the computation is available anywhere it is used. Otherwise it will provide invalid / inconsistent results. As for tests: they wouldn't test this. But you're right. I changed something and didn't fix the tests. Will add it to the todo list. |
To fix fgetattr I think I need to get the basepath somewhere, but all that we have is the file-descriptopr and the fusepath. Do you think one of those options is better than another? |
Storing the basepath in FileInfo is likely the best general solution. |
After looking at the code, deferring the identification of the basepath to calc() time doesn't seem feasible, as the fusepath is not sufficient to determine the basepath. |
I just figured the basepath is a reasonable value to include in the fileinfo and might be useful elsewhere. |
It turns out your choice is far superior. I didn't realize aI could pass it as a pointer (since the branch string is a const in Config) which means it takes up the same memory as the inode solution, but is far simpler and without the performance penalty. I have made the change. I wil squash the commits once we're happy with the change. I just find it easier to keep the history during development. I also tested on my system to confirm and the original pathc (with the default="") gives the same inode for stat() and fstat(). That was surprising to me, since I assumed fgetattr would be used to generate the fstat() results. I added some debug, and it seems fgetattr is not (always?) called for an fstat(). It seems fuse is doing some caching, and calls 'getattr' during open() and caches that result rather than calling fgetattr when fstat() is called. I'm not usre if there is a better way to test ths code. |
Actually, that's a bad idea. The config, currently, can be changed at almost any time. That includes the branch list. So the branch could be removed while the file is open and then the memory would be freed and a use after free would then be possible. The performance and memory usage of a single string is minimal. In theory it could be interned but we've not gotten to a point where that should be needed. So just use a std::string.
Thing is... fstat doesn't translate into fgetattr on the mergerfs side. There is some caching depending on the settings but they added fgetattr a while back but didn't really plumb the kernel side fully to use it in all the places it might be used. Honestly I'm not sure if it currently is used at all but I've kept it for that time when they do. So it really isn't a good test. Because a default of empty would naturally have to change the hash output. Otherwise it'd be a garbage hash :) |
I made the change to dereference the basepath as you suggested. But regardless, at least I can trigger the code and verify that fgetattr and getattr now produce the same result (though I had to do it via injecting log messages rather than being able to write a test app) Edit: Also, I appreciate you taking your time to help me get this right. Thanks! I've been using mergerfs for many years, and really appreciate your dedication to it. |
README.md
Outdated
* basepath-hash: Hashes the branch base path along with | ||
the inode of the underlying entry. This has a similar purpose to | ||
devino-hash, but by using the path instead of the device-id, the inodes | ||
will be gauranteed to be stable across reboots. Useful for backup or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be gauranteed to be stable across reboots. Useful for backup or | |
will be guaranteed to be stable across reboots. Useful for backup or |
A filesystem's device id (st_dev) may change on reboot (eg with zfs). Instead, we can use the files base path (+underlying inode) to generate an inode, which will remain constant across reboots. However, this may have unexpected effects if multiple unique devices appear under a base path. Like hybrid_hash, basehybrid_hash/32 hashes relative path for dirs and basepath_hash for files Original patch by thrnz@github
b33355d
to
d0d265a
Compare
This is an updated version of the patch from #1002
I ran into a similar issue recently where after replacing the SATA controller, the device-ids all changed, and my backup tool which relies on inodes being consistent got very confused.
This is just the patch from @thrnz cleaned up for 2.40.2, with the addition of README updates for the changes.