Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro saves files non-atomically #3148

Open
dmaluka opened this issue Feb 20, 2024 · 16 comments · May be fixed by #3273
Open

Micro saves files non-atomically #3148

dmaluka opened this issue Feb 20, 2024 · 16 comments · May be fixed by #3273
Labels
Milestone

Comments

@dmaluka
Copy link
Collaborator

dmaluka commented Feb 20, 2024

When saving a file, micro simply overwrites it with the new contents. So if writing is interrupted in the middle due to some error (e.g. running out of disk space, unplugging USB stick and so on), the file is left in a damaged state: partially written or just empty (i.e. the user's data is lost).

To prevent this, micro should save files atomically: first write to a temporary file, and then, if write completes successfully, replace the user's file with this temporary file. AFAIK this is how it is done by other editors.

This issue concerns not just the edited files themselves, but also other files that micro may modify: settings.json, bindings.json, backups and so on.

Commit hash: 59dda01
OS: any
Terminal: any

@dmaluka
Copy link
Collaborator Author

dmaluka commented Feb 20, 2024

This issue concerns not just the edited files themselves, but also other files that micro may modify: settings.json, bindings.json, backups and so on.

This actually happened to me today in real life: I started fetching a huge repo in the background, and went on using micro. The fetch eventually ate up all the disk space, and when I started micro yet another time, it showed me Error reading settings.json: unexpected end of JSON input error, since my settings.json was empty; and the command history was empty too. Luckily the actual files I was working with using micro weren't damaged, since I wasn't really modifying them.

@dmaluka
Copy link
Collaborator Author

dmaluka commented Feb 21, 2024

I see other people have also already encountered this problem: #1916

@JoeKar
Copy link
Collaborator

JoeKar commented Feb 21, 2024

So like in the most of the editors overwriteFile() should write to name + e.g. ".tmp" and then use os.Rename() to move?

@dmaluka
Copy link
Collaborator Author

dmaluka commented Feb 22, 2024

More or less. We might need to figure out how to do it fully reliably (or as reliably as possible), e.g. what if the file is a symlink or a bind mount, and so on. And what about Windows (I heard it's trickier to ensure atomicity there).

@JoeKar
Copy link
Collaborator

JoeKar commented Feb 22, 2024

Had already an look into the os interfaces, but unfortunately it doesn't provide that capability (like copy-on-write and move afterwards) with that abstraction already. And yes, Windows could become tricky as well as the integration into the present su-approach.
So from current perspective it looks like reinventing the wheel again. :(

@dmaluka
Copy link
Collaborator Author

dmaluka commented Feb 22, 2024

There are some 3rd-party packages e.g. renameio but I haven't looked into that deeply.

@JoeKar
Copy link
Collaborator

JoeKar commented Feb 23, 2024

There are some 3rd-party packages e.g. renameio but I haven't looked into that deeply.

A naive approach to the problem is to create a temporary file followed by a call to os.Rename().

...and so I was blamed again. 😄
But it sounds worth to it add it as a new dependency, because we then don't need to take care of these corner cases, because someone already did and implemented it for general purpose.

As it is not possible to provide a correct implementation, this package does not export any functions on Windows.

On Windows we can still run into trouble. Due to writefile.go#L15-L16 it looks like we can use this interface under POSIX only.

@X-Ryl669
Copy link

Seems like any solution (even the naïve solution with os.rename) would be better than the current situation where a corruption is guaranteed to occur.

@dmaluka
Copy link
Collaborator Author

dmaluka commented Mar 14, 2024

Yes and no. If for example the user's file is a symlink, and we just rename() it with our temporary file, the user's file silently becomes a regular file instead of a symlink. Which is arguably even worse than the current situation, since it will happen always, not just when an I/O failure occurs.

@X-Ryl669
Copy link

X-Ryl669 commented Mar 14, 2024

Honestly, I don't know if it's a Go limitation or not. On Posix, you're safe to use renameat that will do exactly what you expect since you provide opened file descriptor for the rename operation, thus you've already followed the link to get the file descriptor.

If it's not available everywhere, I guess a readlink is to ensure the target of the link is selected for the new parameter of rename (but it suffers from TOCTOU issues as usual)

@X-Ryl669
Copy link

Also, I don't agree with the idea that leaving a temporary file upon failure is bad. I think, on the opposite that it's a welcome feature. If I edit a file other a SSH session and the network suddenly goes down or a power outage, I'd be a lot more happy to find a .tmp file with the current edited buffer than nothing (like if you used O_TMPFILE).

@dmaluka
Copy link
Collaborator Author

dmaluka commented Mar 14, 2024

Also, I don't agree with the idea that leaving a temporary file upon failure is bad.

Sure, we should keep the temporary file upon a failure, since it contains user data which would be lost otherwise. But I was referring to a different case. I was referring to a case when there is no failure, and as a result, the user's symlink is silently overwritten with a regular file (although the contents of this file are correct). So the user now suddenly has two version of the file (with the old contents and with the new contents) instead of two references to the same file (with the new contents), and has no idea about it.

On Posix, you're safe to use renameat that will do exactly what you expect since you provide opened file descriptor for the rename operation, thus you've already followed the link to get the file descriptor.

I don't think this is true. The descriptor renameat() is provided with is the descriptor of the directory where the file is, not the descriptor of the file itself.

For what it's worth:

~ % echo aaa >test1
~ % ln -s test1 test2
~ % ls -l test1 test2
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:22 test1
lrwxrwxrwx 1 mitya mitya 5 Mar 14 12:22 test2 -> test1
~ % cat test2
aaa
~ % echo bbb >test3
~ % mv test3 test2
~ % ls -l test1 test2
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:22 test1
-rw-r--r-- 1 mitya mitya 4 Mar 14 12:23 test2
~ % cat test2
bbb
~ % cat test1
aaa

and the strace of mv (the gist of it):

renameat2(AT_FDCWD, "test3", AT_FDCWD, "test2", RENAME_NOREPLACE) = -1 EEXIST (File exists)
stat("test2", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
newfstatat(AT_FDCWD, "test3", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "test2", {st_mode=S_IFLNK|0777, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
rename("test3", "test2")                = 0

@dmaluka
Copy link
Collaborator Author

dmaluka commented Mar 14, 2024

BTW it should be worth looking how Vim handles the issue when saving files (and analyzing whether what Vim is doing is reasonable). At first glance it looks like it reuses the backup file for that (or rather, what vim calls the "backup file" is only used when saving the file): vim first tries to overwrite the target file with the contents of the backup file, and if that fails, it resorts to rename()ing the backup file to the target file.

@X-Ryl669
Copy link

You're right. I've confused linkat with renameat. You have a AT_FOLLOW_LINKS or something like this but it's following the old symlink not the new one. There isn't any function to do that atomically on linux.

So you're back to the basic option of doing a readlink on the destination (if it's a link) and update the path accordingly.

That's what vim is doing

@dmaluka dmaluka added the major label Apr 15, 2024
@dmaluka dmaluka added this to the v2.0.14 milestone Apr 16, 2024
@JoeKar JoeKar linked a pull request Apr 28, 2024 that will close this issue
7 tasks
@JoeKar JoeKar modified the milestones: v2.0.14, v2.0.15 Jul 19, 2024
@jmcorey
Copy link
Contributor

jmcorey commented Aug 7, 2024

As this issue is listed as open, I'd like to offer a possibly relevant input--re inode numbers.

On POSIX anyway, sometimes it's worthwhile to preserve the inode number, and thus any outstanding hard links, which does not happen with the (write_new && rename) method. Also affects inotify for instance.

This is one of the reasons why I was a happy vim user for so many years--knowing that vim wouldn't silently break things that depend on this, which I had been burnt by enough times to form a personal opinion on the subject.

Currently seems like micro preserves the inode number when saving, which I like. I would even go so far as to suggest that there should be a unit test such that this invariant is preserved where possible with the default config.

@dmaluka
Copy link
Collaborator Author

dmaluka commented Aug 7, 2024

Yep, in the PR #3273 (not finished) we have already abandoned the idea to use rename (see this discussion: #3273 (comment)), and instead decided to always use a non-atomic method: first write the buffer to a temporary (backup) file, and then write the same buffer to the original file, - so that we cannot guarantee that we won't damage the original file, but we can guarantee that if we damage it, we have a backup to recover the original file. That is less convenient to the user (in case a damage actually happens), but it always preserves the original inode, with all the advantages that come with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants