Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dropped watches #47482

Closed
wants to merge 2 commits into from
Closed

Conversation

MarcCelani-at
Copy link

Fixes #47466

On unix systems, fs.watch places a watch on an inode, not a file path. If a given file is assigned a new inode, the watch is no longer valid and any changes to that file will not generate a watch for TypeScript to consume. This is quite common when using standard tools like vim and git.

This changes the watch handling code to always recreate the watch any time a rename event occurs and the file still exists. In this case, it is very likely that the inode changed (at least in my experience), so its a simple solution to the problem and I would guess it has low overhead. This patch eliminates the most common cause of this bug. There are still race conditions after this PR if someone is editing a file in the background at the exact same time that tsc is setting the watch for the first time. To fix that, setting a watch should be done as follows:

  1. stat the file
  2. create the watch with fs.watch
  3. stat the file and make sure that the inode has not changed. If it hasn't, the watch is active.

IMO the TypeScript team should seriously consider just using chokidar. In addition to avoiding bugs like these (both the one fixed here and the one not fixed and described above), it will also provide much better performance on MacOS because it uses the native fsevents system calls.

@typescript-bot typescript-bot added the For Uncommitted Bug PR for untriaged, rejected, closed or missing bug label Jan 17, 2022
@jakebailey
Copy link
Member

Just to comment on the quote below (and not the PR, which itself seems fine; someone else on the team may be better to review it than me):

IMO the TypeScript team should seriously consider just using chokidar

FWIW it's unfortunately more complicated than that; tsc/tsserver make use of a number of different watching methods for different types of files and directories (which makes it difficult to consider switching to something else with different tradeoffs), and adding a third party watcher would introduce a dependency to the typescript package (currently, it has none).

Also of interest may be #43790 (and its linked issues), and that VS Code is actually removing all of its watchers in favor of @parcel/watcher.

@andrewbranch
Copy link
Member

I would guess it has low overhead

Have you compared the time between the git checkout and the end of updateGraph in TS Server logs before and after?

@MarcCelani-at
Copy link
Author

I should have been more clear...Re-adding the watch and processing files we are supposed to process comes with cost. We should not expect fixing this bug comes at zero cost.

I think what you are more interested in is what is the overhead of resetting the watch. Setting a watch on every file in our mono repo (over 10k files) takes less than a second, so I don't think we are talking about anything that will have serious performance implications here.

That said, this solution can potentially be optimized by stating the file and checking the inode rather than blindly resetting the watch on every rename event. That's a bit more of a complicated change and I doubt it's worth it given the low cost of setting a watch and that stating the file will introduce new cost as well.

Copy link
Member

@sheetalkamat sheetalkamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect change as this is switching the watcher to "Polling mode" watch instead of relying on file system events which is what it is supposed to do. Polling watch is always going to be accurate but it is CPU intensive and hence less desirable for directories and is only used when there is error..
Note that rename event happens whenever a file is added or removed to the folder

If you share your logs, i will be able to investigate the issue and propose the next steps

Eg when watching temp folder, adding file temp/something.js will invoke rename event

@MarcCelani-at
Copy link
Author

@sheetalkamat I'm confused by your comment. When I think of poll, I imagine periodically rechecking the state of the file system rather than waiting for a callback from the OS. Here, we are waiting on a callback from the OS, but we are resetting the watch when the callback fires because it is very likely to be invalid on Unix OS.

When I use polling on our large monorepo, there is a noticeable overhead to it. I see no measurable overhead to resetting the watch when a file changes.

@MarcCelani-at
Copy link
Author

MarcCelani-at commented Mar 25, 2022

I'm not sure what else I can provide here to help you debug. I detailed two very reliable repros on MacOS in the issue description and comments. The one in the comments involves only using VSCode and vim in a separate terminal. But happy to go over some logs over VC if that will move this forward. We are having to maintain this patch across TS upgrades which is a pain.

@MarcCelani-at
Copy link
Author

Also, one final clarification - the rename events occur (at least on macOS) for all sorts of reasons, including when the file changes. It does not only fire for adding/removing a file from the directory.

@jakebailey
Copy link
Member

jakebailey commented Mar 28, 2022

I'm relatively new to the watch code, but I wanted to write out my understanding of the original code and this PR in case it clears things up (so do correct me if I'm wrong).

It seems like watchPresentFileSystemEntry will call fs.watch and only fall back to watchPresentFileSystemEntryWithFsWatchFile when needed. That function is documented in some comments as being based on polling (the function name is super long so I'm just going to call it "the polling watcher" from now on).

The original big long condition in callbackChangingToMissingFileSystemEntry is true when the there's a rename event and the file no longer exists, which then leads to the watcher being turned into a polling watcher that waits until an event fires, then recreates the watcher with watchPresentFileSystemEntry again.

The PR changes the condition to, on a rename, if it doesn't exist, do the same thing as above, but if it does exist, the watcher is recreated assuming that a rename event is likely to mean that the folder points somewhere else on disk via a different inode (as documented in the PR description), but is done so by calling that polling watcher creation function.

That latter point I think is the contentious one; I'd think that if we know (or very likely know) that the watcher is now bad, that it would need to be recreated via watchPresentFileSystemEntry in order to have it potentially pick the non-polling watcher again (as it would have originally).

But, it also feels like it should be expensive to be recreating directory watches (which are potentially recursive) over and over again. Maybe that's why the PR uses polling in the first place, because those are effectively "free" given they defer work until some later poll as opposed to asking the system to do some bookkeeping on FS events?

Maybe ideally, it'd be nice if there was a cheap way to just get the inode number of a directory and compare it without doing the work to recreate a watcher. But, this feels like something that node's watcher should really be handling, as the API is based on paths. It sure would be a surprise for watchers to just totally stop working depending on the details of what some program does to rename or replace a directory/file with other contents or something.

@sheetalkamat
Copy link
Member

The PR changes the condition to, on a rename, if it doesn't exist, do the same thing as above, but if it does exist, the watcher is recreated assuming that a rename event is likely to mean that the folder points somewhere else on disk via a different inode (as documented in the PR description), but is done so by calling that polling watcher creation function.

I think issue here is the number of times the watch will be recreated (if not using polling). Using polling watch on any change in directory is not ideal either. We already have option to allow watchDirectory to use Polling https://www.typescriptlang.org/tsconfig#watch-watchDirectory so i don't think this makes optimal change for general use.

I will need to debug and experiment a little to see what better approach will be here.

@MarcCelani-at
Copy link
Author

ahh ok I think I understand the problem. It's this line:

watcher = watchPresentFileSystemEntryWithFsWatchFile();

Should that just be

watcher = watchPresentFileSystemEntry();

?

I can test this to make sure that the repro steps I outlined in the attached Issue are fixed and update this PR if we think that would resolve the concern here.

@jakebailey
Copy link
Member

I think that addresses my commentary on the original "it's now going to poll which is wrong" comment, but as mentioned above, recreating a non-polling watcher may get expensive. It'd be worth testing (as it's not a huge change), but I can't claim that's "the thing" that would make the PR correct, compared to some other mechanism to address this problem.

@MarcCelani-at
Copy link
Author

Well, given that inode changes require a new watch, I see three possible paths:

  1. Depend fully on recursive watches. This is theoretically most scalable and efficient if it is trustworthy, but we don't seem to trust it on all systems or haven't written the code to do it by default.
  2. Stat the file on rename and check for inode changes. If renames typically correspond with inode changes, this stat is only adding more cost.
  3. Change the watch on any rename (the intention of this PR). Some watch recreations may be wasteful but we avoid unnecessary file stats.

I'm happy to do either 2 or 3 here.

@sheetalkamat
Copy link
Member

Approach 3 is definitely not a generic solution we are looking for.. Just like how we have heuristics for checking directory is still present or not, we would use some kind of heuristics but that needs some experimenting and looking at the events to come up with a plan.. I am going to spend some time on that early next week and will get back to you.

@MarcCelani-at
Copy link
Author

I have completed testing the change to use watcher = watchPresentFileSystemEntry();. Interestingly, I have found this to be more reliable than the previous commit (contrary to the belief that "Polling watch is always going to be accurate"). I have a very reliable repro in my monorepo that causes incorrect diagnostics with upstream VSCode as well as with watcher = watchPresentFileSystemEntryWithFsWatchFile();, but does not repro with watchPresentFileSystemEntry. I wonder if the polling watch code is prone to race conditions when files are changing rapidly (as is the case in a git checkout). At any rate, I know @sheetalkamat is still deciding on a long term solution here, but I will update the PR to use watcher = watchPresentFileSystemEntry() in case anyone wants to apply this patch to typescript to resolve race conditions.

@sheetalkamat
Copy link
Member

Closing in favor of #48997

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
For Backlog Bug PRs that fix a backlog bug
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Persistent incorrect semantic diagnostic after large rebase or checkout: Module has no exported member
5 participants