Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save: Perform write process safe #3273

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

JoeKar
Copy link
Collaborator

@JoeKar JoeKar commented Apr 28, 2024

The target situation shall be, that micro checks if the file to be stored already exists and if so it shall work with a temporary file first, before the target file is overwritten respective the temporary file renamed to the target file.
Possible symlinks pointing to the target file will be resolved, before the save takes place. This shall guarantee that this symlink isn't renamed by the new approach.

TODOs:

Fixes #1916
Fixes #3148
Fixes #3196

@JoeKar JoeKar requested a review from dmaluka April 28, 2024 15:10
internal/buffer/save.go Outdated Show resolved Hide resolved
@JoeKar JoeKar force-pushed the fix/save-atomically branch 2 times, most recently from 7d663d8 to afdc7a1 Compare April 29, 2024 21:46
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
screen.TempStart(screenb)
if err != nil {
return err
}
} else {
if err == nil {
err = os.Rename(tmpName, name)
Copy link
Collaborator

@dmaluka dmaluka Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are legitimate cases when rename fails (or some previous steps fail, e.g. creating the .tmp file) but it is actually perfectly possible to save the file, so we should still try our best to save it, rather than just return an error to the user. For example:

  • Micro has no write permissions to the directory where the file is, so it cannot create other files in this directory and cannot change the name of the file. But it has write permissions to the file itself, so it can overwrite the file contents.
  • The file itself is a mountpoint (e.g. a bind mountpoint), so technically it is on a different filesystem than other files in the same directory, so rename fails.

So, what to do? If the buffer has been successfully written to the temporary file and then os.Rename() failed, it means we already have a backup copy, so we can now overwrite the user's file as well. If overwrite successfully completes (including fsync!), we can remove the temporary file. If overwrite fails, we should keep the temporary file. And we should probably rename the temporary file to the backup file in ~/.config/micro/backups then? (And if this rename, in turn, fails (e.g. if ~/.config/micro/backups is on a different filesystem), we should try to copy & remove then? hmmm...)

But if it was not os.Rename() but some previous step that failed, e.g. if we failed to create .tmp file due to lack of permission, what then? Then we should probably be able to create a temporary file in ~/.config/micro/backups instead. Actually, such a temporary file should be already there, - it is the backup file.

All this makes me think: as you noted yourself in your TODO, we should try to reuse the backup file if possible; and given the above, it seems we'd better just use the backup, no other temporary file. So the algorithm can be simple and clean:

  1. If the backup option is on, synchronize the backup file with the buffer.
  2. If the backup option is off (no backup file yet), create the backup file.
  3. Rename the backup file to the user's file.
  4. If rename failed, overwrite the user's file with the buffer.
  5. If overwrite failed, don't remove the backup.

The .tmp approach might have some advantages over the above simplified approach (the reduced likelihood of rename failure, e.g. if the user works a lot file files on a different filesystem than ~/.config/micro), but they seem to be outweighed by the increased likelihood of rename failure in other cases (e.g. lack of permission to the directory), the increased complexity and mess, and the fact that we may leave nasty .tmp files around in the user's directories.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW looks like vim does more or less the same: https://github.com/vim/vim/blob/master/src/bufwrite.c#L1179

Moreover, if the file is a symlink, it doesn't seem to try to resolve it, it immediately falls backs to overwriting the file from the backup instead of renaming. I guess we can do the same, for simplicity (after all, a symlink is just a corner case, and we need to support the overwrite fallback anyway).

OTOH, I guess it doesn't hurt if we do resolve the symlink. E.g. we can move that to a separate resolveSymlink() function, so the code will not become much more complicated.

P.S. And nano, for that matter, doesn't seem to use rename at all: https://git.savannah.gnu.org/cgit/nano.git/tree/src/files.c#n1748
It just always overwrites the file from the temporary file, even though it is slower and less convenient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. I hope we don't need to do all those scrupulous proactive checks like st.st_dev != st_old.st_dev that vim does, we can just try os.Rename() and fall back to overwrite if it fails. With the exception for symlinks, which we need to check or resolve beforehand, to prevent replacing them with regular files.

BTW... what about FIFOs, sockets, device files, ...?

Copy link
Collaborator Author

@JoeKar JoeKar Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW... what about FIFOs, sockets, device files, ...?

micro seems to have serious trouble with such files anyway, it crashed with a fifo, it hung with a fd...
vim "opens" them and tells, that these are no files.

Edit:
I suggest to handle files only in case they're supported by micro. Means, we should prevent loading (and storing) files like ModeDevice + ModeCharDevice, ModeIrregular, etc.pp. because a crash or stuck is unacceptable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the backup option is off (no backup file yet), create the backup file.

Since the backupThread is async (with possible delay) we can create the backup anyway and don't need to consider the option. For the removal of the backup the option is important then, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our goal is only to ensure that the original file is never left in a damaged state (or in the worst case, it is left in a damaged state, but we have a backup so we can restore it), right? If /home is full so we cannot do anything, it is not our problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...A more tricky problem: when renaming, we should preserve file permissions (chmod) of the original file, and probably also its owner & group (chown). And even worse, probably also its extended attributes (setxattr).

Vim seems to do all that: here, here and here.

And if anything of that fails, we should probably fall back to overwrite, before even trying to rename. (We don't want to silently change user's file attributes, right?)

Copy link
Collaborator Author

@JoeKar JoeKar May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's a position. But even then we can only proceed in the moment the backup was successful, otherwise we would take a broken backup as source to overwrite a healthy target.

Our goal is only to ensure that the original file is never left in a damaged state (or in the worst case, it is left in a damaged state, but we have a backup so we can restore it), right?

Do we then really have? In the moment we ignore running /home out of space and accept damaging the backup we've no "plan b" in the moment something goes wrong with the target.

Edit:

And if anything of that fails, we should probably fall back to overwrite, before even trying to rename. (We don't want to silently change user's file attributes, right?)

Yes, then we're safer with the direct overwrite from a different source instead moving the source.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure to avoid this situation. If the backup was unsuccessful, we should not overwrite the target file with it, we should just tell the user that we cannot save the file. Also I think we should remove the broken backup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can live with that. It's the simpler approach and we don't need to use temporary files.

@JoeKar JoeKar marked this pull request as draft April 30, 2024 18:46
@JoeKar JoeKar force-pushed the fix/save-atomically branch 3 times, most recently from 1ea6aaf to 3510fcc Compare May 12, 2024 19:47
@JoeKar JoeKar marked this pull request as ready for review May 12, 2024 19:49
internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/action/bindings.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
cmd/micro/micro.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
@JoeKar
Copy link
Collaborator Author

JoeKar commented May 16, 2024

Still a lot of rework left, but thank your for your time reviewing this! 👍

@JoeKar JoeKar marked this pull request as draft May 23, 2024 18:39
@JoeKar
Copy link
Collaborator Author

JoeKar commented May 26, 2024

I did a lot of (commit) refactoring, but still I'm not fully happy with it.
The backup approach is still messy (duplicated code, no single point of truth, etc.), but currently I'm running out of ideas.

internal/buffer/buffer.go Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/util/util.go Show resolved Hide resolved
@dmaluka
Copy link
Collaborator

dmaluka commented May 27, 2024

I did a lot of (commit) refactoring, but still I'm not fully happy with it. The backup approach is still messy (duplicated code, no single point of truth, etc.), but currently I'm running out of ideas.

IIUC, basically, all these troubles are due to the need to avoid circular dependencies between packages?

If so, it seems we could avoid these troubles by implementing overwrite & backup stuff as a part of the config package. OTOH, that sounds like an abuse of the config package: it is supposed to implement stuff related specifically to configuration, not stuff like buffer backups.

So maybe instead, simply provide separate implementations of overwrite & backup stuff for different kinds of files, in both buffer and config packages (especially since, as I noted in another comment, it seems better to have separate implementations of Overwrite() anyway). Maybe make them implement some common interface (if there is a need for that).

A very rough idea:

type FileWriter interface {
	Overwrite(name string) error
	BackupName(name string) string
	KeepBackup() bool
}

and a common Write() implementation using this interface:

func Write(name string, fwriter FileWriter) error

Or maybe even Write() implementations need to be different for different files.

P.S. Side note: maybe let's name it SafeWrite()?

@JoeKar
Copy link
Collaborator Author

JoeKar commented Jun 1, 2024

Yes, I know dc701a1 doesn't really belong to the PR in the first place, but I touched some WriteFile's anyway.
It was not a big deal to do this cleanup.

@JoeKar JoeKar marked this pull request as ready for review June 2, 2024 19:36
internal/buffer/backup.go Outdated Show resolved Hide resolved
@JoeKar JoeKar changed the title save: Save files atomically save: Perform write process safe Oct 31, 2024
cmd/micro/micro.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
internal/buffer/save.go Outdated Show resolved Hide resolved
If the new URL encoded path is found then it has precedence over the '%' escaped
path. In case none of both is found the new URL approach is used.
internal/buffer/save.go Outdated Show resolved Hide resolved
@dmaluka
Copy link
Collaborator

dmaluka commented Nov 3, 2024

I'm thinking we should also clearly inform the user about the overwrite failure and possible corruption immediately at the moment when it occurs, not only when the user opens this file next time. E.g. something like:

diff --git a/internal/buffer/save.go b/internal/buffer/save.go
index b24f075d..3f35553f 100644
--- a/internal/buffer/save.go
+++ b/internal/buffer/save.go
@@ -4,6 +4,7 @@ import (
 	"bufio"
 	"bytes"
 	"errors"
+	"fmt"
 	"io"
 	"io/fs"
 	"os"
@@ -22,6 +23,18 @@ import (
 	"golang.org/x/text/transform"
 )
 
+const OverwriteFailMsg = `An error occurred while writing to the file:
+
+%s
+
+The file may be corrupted now. The good news is that it has been
+successfully backed up. Next time you open this file with Micro,
+Micro will ask if you want to recover it from the backup.
+
+The backup path is:
+
+%s`
+
 // LargeFileThreshold is the number of bytes when fastdirty is forced
 // because hashing is too slow
 const LargeFileThreshold = 50000
@@ -312,6 +325,7 @@ func (b *Buffer) safeWrite(path string, withSudo bool) error {
 	b.forceKeepBackup = true
 
 	if err := b.overwriteFile(path, false, withSudo); err != nil {
+		screen.TermMessage(fmt.Sprintf(OverwriteFailMsg, err, backupName))
 		return err
 	}
 	b.forceKeepBackup = false

Maybe not exactly as above.

We probably also want similar for writing to settings.json and bindings.json, and for that it's trickier since the util package cannot call the screen package. So, don't know, maybe add one more return value to SafeWrite() to inform the callers that it was not just some error but specifically an error overwriting the target file. Or perhaps instead of another return value, do it the "fancy" idiomatic way, by making the returned error more "structured", by wrapping in into another error, so that the caller can distinguish whether it is an overwrite error or not, as documented in https://pkg.go.dev/errors (I'm not familiar with that myself yet).

@JoeKar
Copy link
Collaborator Author

JoeKar commented Nov 5, 2024

Or perhaps instead of another return value, do it the "fancy" idiomatic way, by making the returned error more "structured", by wrapping in into another error, so that the caller can distinguish whether it is an overwrite error or not, as documented in https://pkg.go.dev/errors (I'm not familiar with that myself yet).

I started with it and was already able to differentiate between the error types via errors.Is(). We've to keep in mind, that the fs.ErrPermission is no overwrite-error/fail, but just a hint to bubble up and perform the usual sucmd for a second try.


This likely means that micro crashed while editing this file,
or another instance of micro is currently editing this file,
or the necessary permission to overwrite the file was not given,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Indeed, overwriteFile() and os.WriteFile() return error not only if it fails to write to the file (in the middle of writing) but also if it fails to even open the file. But this means that our implementation is broken and should be reworked: we keep the backup in the case when we have no good reason to keep it, since we know that the file is not corrupted at all (since we know that we even failed to open it for writing, so we had no chance to corrupt it).

(Just in case, I don't mean that we should just add a check whether the error is fs.ErrPermission or not. Open may fail for different reasons, not only because of lack of permissions.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just in case, I don't mean that we should just add a check whether the error is fs.ErrPermission or not. Open may fail for different reasons, not only because of lack of permissions.)

I know, because we already added new checks like...
https://github.com/zyedidia/micro/pull/3273/files#diff-e33a9d01c51bf09d6ef92d1a430e36f49aefed071acf4668ec68a9b68200e996R244-R249
...but os.Stat() and its FileInfo isn't feasible enough.

So it looks like we need a "dry" try here. For os.O_CREATE it isn't really "dry", but with this we can at least check if we can create a new file in this directory first, since we're on the way to create and write it anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like we need a "dry" try here.

Why???

In overwriteFile() we already call os.OpenFile() before writing, so we can already easily distinguish whether it was a write error or not.

In util.SafeWrite() both open and write are done internally by os.WriteFile(), but that simply means that we shouldn't use os.WriteFile(), we should use os.OpenFile() + f.Write() + f.Close() directly.

Copy link
Collaborator Author

@JoeKar JoeKar Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why???

Hm, valid question.
I confirm, that util.SafeWrite() is easy to fix and split up os.WriteFile() into its pieces.

For overwriteFile() I agree, that we've the split parts already available, but separated in its own logic. We could move the util.OverwriteError into overwriteFile() and check for util.ErrOverwrite in safeWrite() and assume that everything different from util.ErrOverwrite took place on open (and we remove the backup), but most probably this isn't the case in every scenario/propagated error.
We could also add a specific util.ErrOpen, to have a clear trigger when it took place, but this doesn't change the fact in case we identify the trigger inside overwriteFile() we already wrote a backup and thus unintentionally touched the FS, wasted an inode etc.pp.
So I've to assume this wasn't what you had in your mind, but splitting up overwriteFile() respective extracting the os.OpenFile() out of it (maybe forward the file into it) and execute it before we create the backup, as we now do in util.SafeWrite().

Copy link
Collaborator Author

@JoeKar JoeKar Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One further though regarding:

Why???

What about the situation in which we open the file before we create the backup (since the open could fail and we don't want the backup then)...with os.O_TRUNC, but writing the backup fails? Usually we would close the target file now, but we opened with os.O_TRUNC and thus broke it without backup.
Don't we run in circles then?

Edit:
We could ignore the failing backup then and try to write the target file, which was...when I remember right...one of your ideas a long time ago in this PR. At least there is a small chance that we don't end up with two broken files. But still, somehow away from being perfect.

Copy link
Collaborator

@dmaluka dmaluka Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O_TRUNC is not essential either, we could open without O_TRUNC, and then f.Truncate(), right?

But I'm not sure why are you talking about opening the file before creating the backup (we always try to create the backup before we try to do anything with the target file, and we are not going to change that, right?)... You are talking about a lot of different things here, mostly unclear, so it's hard to follow what are you talking about at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O_TRUNC is not essential either, we could open without O_TRUNC, and then f.Truncate(), right?

Check.

But I'm not sure why are you talking about opening the file before creating the backup (we always try to create the backup before we try to do anything with the target file, and we are not going to change that, right?)...

I'm puzzled right now. Wasn't exactly this the intention due to "[...] this means that our implementation is broken and should be reworked"?
As figured out...

[...] since we know that we even failed to open it for writing, so we had no chance to corrupt it [...]

...we had no chance to brake anything, since we were not even able to open the file, but we already created the backup (see above) and thus causing the error message to appear etc.pp.
I don't like to create the backup before opening the target, realizing the open fails and thus removing the backup again, to prevent implausible messages.
Why even create the backup, when we're able to figure out that it doesn't make any sense?

So from my perspective it is the logical step to open (O_WRONLY|O_CREATE) the file before we create the backup. In case the open fails we don't create the backup. And all this by splitting up the relevant parts into:

  1. open()
    1.1. return on error
  2. defer close()
  3. backup()
    3.1. return on error
  4. write()
    4.1. return on error
  5. sync()

overwriteFile() is a bit more tricky...

Please correct me in case I'm totally wrong or misunderstood your comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah indeed, good point. Well, that wasn't really the intention I had in my mind, since "creating the backup before opening the target, realizing the open fails and thus removing the backup again" is still not as bad as keeping the false-alarm backup, since at least it has no annoying user-visible effects [*].

In fact I'm not even sure it is so bad on its own. If it means less code complication (by keeping the simple scheme "first deal with the backup, then deal with the target file" and just extending the existing logic deciding whether to remove the backup or not), then maybe why not.

[*] Unless we fail to remove this unneeded backup, for example because we crashed before removing it. But that is a rather bad situation anyway, which could as well result in an actual corruption of the target file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is still not as bad as keeping the false-alarm backup, since at least it has no annoying user-visible effects

Fully ACK

In fact I'm not even sure it is so bad on its own. If it means less code complication [...]

It hits overwriteFile() the most, where it would be easier to use a equivalent of util.ErrOverwrite (e.g. util.ErrOpen) to identify that situation outside of overwriteFile().
But maybe it's worth to do this further refactoring to prevent the unnecessary backup in case of a failed open(). Since we're not aware of the situation which FS is used below the OS abstraction, if it uses caching, which block device is below of that, which special kind of HW etc.pp...so the best practice is to lower the file operations as much as we can.
Do you agree?

In case it becomes too complex or incomprehensible then we're still able to revert it to the simpler solution and accept the unneeded write/delete of the backup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree. Let's see the actual code.

}

backupName := util.DetermineEscapePath(backupDir, path)
_, err = b.overwrite(backupName, withSudo)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to ever write the backup with sudo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants