-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd file behavior in virtual branches, perhaps after installing Windows Updates #70
Comments
I don't entirely know how to explain that behavior, especially that a reboot fixed things. Because of the way the .NET System.IO file classes behave, that C# program unfortunately isn't telling you what you may think it is. Your provider must have gotten directory enumeration callbacks, otherwise the C# app wouldn't have seen anything in the enumeration (since what it saw was a virtual file). This also explains why So did your C# app use a bogus path for the file? That's what I understood from the comment "bad path" on the first line. If that's the case, then it seems like there could be a bug either in how we're sending directory enumeration callbacks, or there's a bug in your provider in how it services the callbacks. Somewhere along the line, a bad file path turned into something that your provider was able to enumerate (hence the virtual files returned for enumeration), but that it could not produce placeholder information for (hence the failure to actually open the file when you called |
sorry, "bad path" was meant to mean the full path to the misbehaving file. I have noticed that .net File attributes returned are not always accurate/complete with virtual or placeholder files, which is why I was surprised that I got the same attributes returned when I enumerated with a FindFirstFileEx (without the ON_DISK flag). FindFirstFileEx that does list REPARSE_POINT as an attribute in normal situations when enumerating with the ON_DISK flag, at least, though I can't say I've tested it without the ON_DISK flag as I generally am doing it to avoid invoking callbacks. If/when I see this again I will verify if I'm getting callbacks or not related to these misbehaving files. |
ok, we just had another case and I was able to remote to the machine and debug my host app while a directory was in a bad state. In this case there was a single directory that was behaving oddly, and was slightly different from above, so may not be exactly the same issue ... but has a lot of similarities. Here is what I observed in this case:
I did dump the reparse data again to compare now that it was working and the thing that changed were the 16 bytes before the ProviderId, but that appeared to have changed to a new, consistent value for all directory placeholders created with this instance of my host app. It seems like there must be some state that is going wrong in how directories are mapped to callback host applications (especially given the 'virtualization provider not available' errors for this directory while all the surrounding directories were under the same one virtual root mapped to my one provider app instance)? I would think that it was in the directory reparse data, but the fact that the other similar issues were resolved with a reboot only makes it seem like there must be some in-memory mapping that is getting mucked up somehow (and nuking the directory placeholder, as we did in this case, also resets that state similar to a reboot)? Thanks for you help! |
If you still have the reparseData dumps would you mind sharing those (please obscure the data that shouldn't be visible) ? I am hoping you did see a ProjFs reparseTag for the misbehaving directory \bin\online_Release\deployment as well. |
Unfortunately I don't have the raw reparse data dumps, but as part of my tests I did look at them and compare the misbehaving directory's reparse data to its working parent directory's reparse data, and they were identical except for the path strings (and reparse data size, because of the string length difference). I will attach the raw dump if/when we see another case. |
Hi all, I'd like to report a similar case we have been seeing with our project (https://github.com/facebookexperimental/eden). We have received several user reports where files listed in ProjectedFS is not accessible despite the file itself is available. The file will appear in file explorer as well as directory listing in PowerShell. I confirmed the file in question was included in our reply to However, any attempt to directly access the file will result as a file not found error from Windows. For example, a Python tool we have to check ProjectedFS state via ctypes will report We also do not observe any The machine seeing is issue is also on the latest Windows version (21H1). The only pending patch not installed is kb4023057. Similar to OP, we tested both restarting the virutalization provider and clearing the negative path cache but no avail. The only workaround we have found so far is to reboot the machine. |
@fanzeyi thanks for reporting the issue. If you still have the repro would you mind capturing and sharing ProjFs logs ? If yes,
@FeralSquid you could help us with the traces as well. |
Adding some context. Looking back at the internal EdenFS user support, it looks like the first time this issue was reported was in April this year, but some users may have hit the issue sooner than that. |
One of our user just hit what appears to be this issue. At first, listing files in a directory under the virtualization root showed all but one file. Thinking that this might be an EdenFS bug, they then restarted EdenFS, but trying to open the file afterwards gave them the "The provider that supports file system virtualization is temporarily unavailable". Creating a file in that directory also failed with the same error, which appears oddly similar to @FeralSquid's comment from above. The reparse point on the directory appears to be in order:
Querying the file in question fails:
I've pointed the user to this issue and they collected the trace requested above. What's the best way to send you this trace file? |
How big is the trace file? |
The zipped version is ~20KB, but I'm more concerned about potential private information contained in it to post it on Github :) |
Oh, I don't want you to post it to GitHub in any case. :-) I was thinking along the lines of email vs. some file share service. Please attach it to an email and send it to msfltdev (at) microsoft (dot) com. |
Hello all, I was just able to grab the logs from a new live repro case. I've emailed them to the address above. Any luck finding something in xavierd's logs? |
Hello,
My organization has been using a ProjFS-based file syncing setup since late last year. Since around March of this year, the number of machines using this system has grown from ~400 to ~700 now. This is the primary way people get and interact with the files they work on constantly every day.
I give this background because, as of yesterday, I've now seen what appears to be a new and very odd issue for a 2nd time (on a different machine) within the last ~week.
Nothing about the code managing ProjFS or the backing store information has changed in any substantial way for a few months now (some minor inspection improvements is about it, nothing that impacts interactions with ProjFS).
The only potentially interesting data point I have is that a few days before the first case our IT group rolled out a series of Windows Updates. That said, in both cases I had confirmed that the users did NOT have any pending windows updates at the time (they had installed and rebooted since), as that can cause all kinds of odd behavior.
The updates installed were KB5004245, KB5003539, KB5004748, KB5004772. The last batch of windows updates were installed a month prior (before this issue was first reported).
In both cases, the odd behavior mysteriously went away after a system restart.
Because that appears to fix it, and because our users sometimes like to attempt to fix things themselves and not let us know, it is possible that there have been other cases of this that I'm just not aware of. That said, the timing of 2 cases within a week, and just after the updates, combined with the reboot-as-a-fix, seems a little suspicious.
Anyway, on to the odd behavior...
The user notices the issue when a ProjFS file fails to be read, and so causes some user application to fail to start (if it is an exe/dll/etc) or fail to load some content file.
This behavior, though, only impacts a subset of the ProjFS files under a given virtual root, while other virtual files work perfectly fine (I am able to trigger placeholder creation, hydration, PrjDeleteFile back to virtual and repeat with other files). In the first case, all ProjFS files in a given directory were impacted (there were some full files mixed in the directory that worked normally).
I'm not sure if it was all files in the same directory for the 2nd case as the user rebooted before I could test that case thoroughly.
In the first case, though, I did have time to try a handful of things on these badly behaving files, with odd results:
...then, in C#, I tried:
var path = // bad path
var fi = new FileInfo(path);
// fi.Exists => false
// fi.Attributes => 0xFFFFFFFF
var di = new DirectoryInfo(Path.GetDirectoryName(path));
var file = di.EnumerateFiles().First(f => string.Compare(f.FullName, path) == 0);
// file is NOT NULL ... it was enumerated
// file.Exists => TRUE !!
// file.Attributes => 0x00440001 (RECALL_ON_DATA_ACCESS | RECALL_ON_OPEN | READONLY) -- looks like a placeholder, but no REPARSE_POINT attribute??
using (var fs = file.OpenRead())
{
fs.ReadByte();
}
// throws FileNotFoundException
FindFirstFileEx w/ FIND_FIRST_EX_ON_DISK_ENTRIES_ONLY lists no items in the directory !? (full files had been deleted at this point, leaving only the badly behaving files in the directory)
FindFirstFileEx without that flag lists the bad file with the above attributes
Calling CreateFile() directly to try and get any reparse point data also failed to invalid path.
I also tried:
... and re-running all of the above tests, all with the same result.
Throughout all of this, my ProjFS application logged no errors, and I suspect got not callbacks (though I can't fully prove that).
The file in question had been newly created on this machine the day before, then submitted to version control and the user had later sync'd. So, the file should have transitioned from Full-Writable -> Full-ReadOnly -> PrjDeleteFile to make it virtual -> Placeholder (app-triggered) and broken. All of this occurred a day or two after taking the windows updates.
I didn't notice any obvious critical or error system events in that time period.
And then, as a bit of a last ditch, I asked the user to reboot ... and after they did, the badly behaving placeholder files suddenly worked perfectly!
Rebooting also "magically" fixed the 2nd case.
Any ideas?
Any other things you'd want me to try if/when I see the next case?
Thanks!
The text was updated successfully, but these errors were encountered: