Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docker cp #20

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Fix docker cp #20

wants to merge 19 commits into from

Conversation

andergnet
Copy link

This pull request fixes issue #18

Copy link
Owner

@kolayne kolayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good evening! Sorry, I didn't notice the PR when you opened it and then didn't check my email for almost a week... 😭

But here I am :)

Firstly, thank you for the PR! There are some changes that I request below. Please, see my comments and let me know if I misunderstood something that you did intentionally.

Second, a general note (too late for this PR but may be useful for your future contributions): when your PR, while addressing a single problem, can be broken down into logical sub-steps, which you take to implement the patch, many maintainers prefer that you make a commit per step, so that your changes can be reviewed step by step as well. For example, I see two logical sub-steps of this PR: first, separate the existing code between driver.go and activemount.go, next, patch the multiple-mount logic.

Btw, providing detailed commit messages, answering the what&why questions ("what did you do? why did you do that?") is also beneficial and gives the reviewer an insight into your changes before they even starts reading your code!


UPD: on this page my comments are displayed in the order I created them, but it's better if you read them along with the code, in the order they appear in the Files changed section

activemount.go Outdated
Comment on lines 38 to 40
payload, _ := io.ReadAll(file)
json.Unmarshal(payload, &activeMount)
file.Close()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three of these functions may return an error. In go, you usually want to check the error value when one is returned and handle it somehow. In some cases it makes sense to behave in one way or another when there's an error; in other cases the best you can do is just log the error and/or return it from your function (so the caller has to deal with it). Finally, in some (pretty rare) cases you indeed want to ignore an error value returned by a function. In such cases it's better to do this explicitly, e.g., _ = file.Close() rather than file.Close() and, perhaps, leave a comment, explaining, why it's ok to ignore an error here.

In your case:

  1. The call to json.Unmarshal is supposed to set the value of activeMount, used further in the code. If an error occurs, what is the value of activeMount? It's probably either unaffected (that is, remains the default value, as of the declaration in the beginning of the function), or the value is unspecified (can be invalidated) - I'm not sure, we should check out the docs. So, if there is an error, it should definitely be handled, and, likely, lead to the failure to mount this volume (because its internal filesystem state became invalid).
  2. The call to file.Close is much less important, and there isn't much we can do about it if an error occurs. But, firstly, problems with closing files may result in a file descriptor leak (although it's unlikely, as kernels try to release a descriptor early in the closing process), second, it may indicate an I/O problem, which a user would probably want to know about. So here I would prefer logging the error if it occurs, but continuing trying to activate the volume, unlike the previous case.
  3. Finally, io.ReadAll in this particular context may be one of those cases when you can legitimately ignore an error! On the one hand, logs, presumably, would be more informative if you checked it, on the other hand, if there was an error in your io.ReadAll, then an error is bound to occur during json.Unmarshal as well because if io.ReadAll either couldn't read anything or only read the file partially, the result won't be a valid JSON dictionary (and your object corresponds to a dictionary), so unmarshalling will fail. Yet, this is not too obvious, so, if you choose to ignore the error here, you should leave a comment to explain why.

Btw, you can see some examples of documented functions in volumeTreeManagement.go. Ironically, all of them are unexported

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: f6634d9

activemount.go Outdated

func (d *DockerOnTop) Activate(request *volume.MountRequest, activemountsdir lockedFile) (bool, error) {
var activeMount ActiveMount
var result bool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this name more informative?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this was an issue also in driver.go (see comment), I have named it the same way everywhere: 1985b52

activemount.go Outdated
}

func (d *DockerOnTop) Activate(request *volume.MountRequest, activemountsdir lockedFile) (bool, error) {
var activeMount ActiveMount
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be declared closer to its usage, making it easier to read the code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 7043a09

activemount.go Outdated
Comment on lines 44 to 47
} else {
log.Errorf("Active mount file %s exists but cannot be read.", activemountFilePath)
return false, fmt.Errorf("active mount file %s exists but cannot be read", activemountFilePath)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That's too little information: we have an error err and could've told what actually happened. See the doc for fmt.Errorf, in particular, its %w verb.
  2. I can see you both log an error and return it. The messages are very similar but slightly different (in case and punctuation). Is that intended? As there are several places like this, would it make sense to only return an error here, and then log it from the caller?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied the suggestions as they do make good sense. As if it was intended, the case and punctuation changes they were, initially they where the same with human readable format (casing and sentence ending dots) but the linter complained about that format on the error message. That's how it ended up like this.

c57da77
f9aadb4

activemount.go Outdated

activeMount.UsageCount++

payload, _ := json.Marshal(activeMount)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is ignored explicitly but it's not clear to me why. json.Marshal's behavior should be much more predictable than that of json.Unmarshal, so it's probably safe to ignore the error here, but, please, write a comment explaining why, relying on the documentation of json.Marshal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear to me why. json.Marshal's behavior should be much more predictable than that of json.Unmarshal
We have control over what the Marshal-ed structure (activeMount) so we will not get suprises there. Unmarshal, on the other hand, deals with a file in the filesystem and we could have a surprise if someone changes it.

I have added a comment to indicate why it is safe to ignore the error, which basically boils down to the fact that there is no room for error when converting an structure containing a single integer to JSON.
03f2054

driver.go Outdated
if errors.Is(readDirErr, io.EOF) {
// No files => no other containers are using the volume. Need to mount the overlay

mount, err := d.Activate(request, activemountsdir)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what kind of value the boolean variable mount holds here. Is there a better name for this?

Is it an instruction for us returned by d.Activate, i.e., pleaseMount/doMount?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to doMountFs also to differenciate between the two meanings of "Mount".
1985b52

activemount.go Outdated
Comment on lines 97 to 98
log.Errorf("The active mount file %s could not be deleted", activemountFilePath)
return false, fmt.Errorf("the active mount file %s could not be deleted", activemountFilePath)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same two points as for a similar error handling earlier.

Plus! As I can see, the behavior here is significantly different from that of previous docker-on-top. As I can see, this branch is handling the case when we are supposed to actually unmount volume and remove the mount file but failing to remove the file.

One thing you've changed is the order: you're first trying to remove the file, and then, on success, attempt to unmount the overlay (unlike the previous code which attempted it the other way around). And that's for the better! In the old version, if an error occurred during the file deletion, this would bring the volume in a very bad state (overlay is not actually mounted but it looks like it is, so other containers will try to use it straight away - expect horrible bugs); with this change, if an error occurs during the file deletion, the volume is in a just bad state (the volume looks like it's used, although it's not, but it's at least properly mounted).

The other thing you've changed is the logging level: it used to be log.Criticalf, you changed it to log.Errorf. If you think of it, that probably makes sense too, given the above.

So, actually, no problem that I first thought of when starting the "Plus!" part. Good job! But I would still add something to this error message, something like "this volume will never get unmounted until a reboot"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really convinced with the new text either, if you have a suggestion please do not hesitate to propose.
Now that the log messages are handled in driver.go, i would keep all of them as Errorf and not Criticalf for the sake of simplicity and the fact that I do not consider the errors more severe.
e441320

driver.go Outdated
"manually to fix the problem", err)
log.Debugf("Unmounted volume %s", request.Name)
} else {
log.Debugf("Volume %s is still mounted. Indicating success without unmounting", request.Name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with the error message change here. Here, you're seemingly saying "still mounted so I'm not unmounting it", which doesn't make sense. That's because of the ambiguity between "mounted" as in "a container is running where this volume is used" and "mounted" as in "the underlying overlay filesystem is present". IMO, the previous error message was clearer:

log.Debugf("Volume %s is still mounted in some other container. Indicating success without unmounting", request.Name)

Maybe could be phrased even better, e.g., Volume %s is still used in another container. Indicating success without unmounting

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the one you proposed, it's more clear of what's happening:
197c6fe

activemount.go Outdated
UsageCount int
}

func (d *DockerOnTop) Activate(request *volume.MountRequest, activemountsdir lockedFile) (bool, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on both DockerOnTop.Activate and DockerOnTop.Activate:

  1. Their names probably don't quite reflect what they do. If I saw the following piece of code:

    var d DockerOnTop = ...
    d.Activate(...)

    I would most probably assume that d.Activate activates the d object (whatever it could mean) because that's what the name suggests. But what it actually does is it activates a volume. If we had a proper volume object, it would make sense to move this method there and have volume.Activate, but as its a method of DockerOnTop, its name should probably be ActivateVolume

  2. In go, the case of the first letter (i.e., whether it's lowercase or uppercase) has a semantic effect: if an entity name is capitalized, it's exported (i.e., public - available from other packages), otherwise it's unexported (i.e., private, only available in this package). As I understand, the .Activate and .Deactivate are both intended for internal use only (a caller external to DockerOnTop shall not call them directly). If that's the case, their names should be changed.

  3. If I'm wrong and you intended that these functions are exported, it's a go convention to have a documentation string for every exported entity (see Documenting go code for details). You may notice that other exported methods of DockerOnTop don't have documentation, which is a flaw, but all of them correspond to Docker API calls, whereas the two new functions you are adding are something new.

  4. Finally, as you may notice in further comments, at some points it wasn't clear to me what these functions should do and, especially, what they should return. That's why, even if you didn't mean to export them and you'll rename them to .activateVolume and .deactivateVolume respectively, they would still benefit from a short docstring that explains what they do and what they return.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this looks better now with the changes I have done:
Commit: b18043e

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also commit 3a67652 since I didn't properly rename deactivateVolume

activemount.go Outdated
"github.com/docker/go-plugins-helpers/volume"
)

type ActiveMount struct {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant to call it activeMount (see point 2 of the next note, on exporting entities in go).
(note that unlike the structure, its field should be exported: otherwise json.Marshal/.Unmarshal won't be able to access it, as they are not part of this package)

Copy link
Author

@andergnet andergnet Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my first real program in Go, I had no idea about that, I simply copied how VolumeInfo was defined. I have fixed it.
e55a292

@kolayne kolayne linked an issue Jul 27, 2024 that may be closed by this pull request
driver.go Outdated
Comment on lines 189 to 192
} else if mount {
lowerdir := thisVol.BaseDirPath
upperdir := d.upperdir(request.Name)
workdir := d.workdir(request.Name)
Copy link
Owner

@kolayne kolayne Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read this comment first.


Here you, unfortunately, did kind of the opposite (similarly, you reversed the original order, but it has the inverse effect in this case).

Combining the logic inside d.Activate and the code below this comment up to and including the syscall.Mount (that's the code you did not modify), the overall (new) logic is as follows:

  • Check if the volume is already mounted based on activemount files. If it's not:
  • Create a new activemount file to indicate that I am the user of the volume [may fail]
  • Perform the actual mount of the overlay [may fail]

Now, imagine the file creation step succeeds but the next one fails. For that container, of course, we'll report the error and it will refuse to start, but the volume ends up in a very bad state. If another container tries to use that same volume, it will perform the same steps above, but in the first one it will find an activemount file from the previous container and will assume that the volume is already mounted, while it in fact is not!!

The reverse order is not very good either but slightly better: if the first container manages to mount the volume but doesn't manage to mark it as mounted, it, likewise, won't let the current container start (so that overlay mount remains unused), then, when the next container attempts a mount, it will run the same steps and, in the trickiest case, will just mount the same overlay on top of the old one and start using it (which is weird because there are two overlays mounted simultaneously using the same internal directories, but should be ok, as the first overlay is never used, being shadowed by the new one).


So,

  1. First possible improvement for your code: your implementation should at least emit a scary screaming error!
    (see examples from the old code - errors used in very similar situations: 1, 2)
  2. Second, it would be much better if you could fix things such that the order of the system calls (mount/unmount call and activemount file creation/deletion call) is old (as in my code) for the Mount operation but is new (as in your code) for the Unmount operation.

In fact, I now see how we could improve the Mount part, so that we never even attempt to mount overlay on top of anything old. Of course, this is out of scope of this PR, so I'll create a separate issue (but it still requires that you restore the old order here in Mount)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have improved the way Mount function works, I keep the new order but in case the filesystem mount fails it will undo the activemount creation so it returns the function in a consistent state: filesystem not mounted and no activemount present.

01de48d

@andergnet
Copy link
Author

@kolayne I think I have addressed all the issues you raised in the review. Could you please review it again?

I tried to be more transparent and show every step of the way but I'm not used to do it, so my apologies if you expected something different.
I won't have access to a proper computer for a few weeks, don't worry if I do not answer I'll do it when I'm back.

Copy link
Owner

@kolayne kolayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Here are some more things to fix

activemount.go Outdated
// The directory is empty, mount the filesystem
doMountFs = true
} else {
return false, fmt.Errorf("failed to list activemounts/ %v", readDirErr)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false, fmt.Errorf("failed to list activemounts/ %v", readDirErr)
return false, fmt.Errorf("failed to list activemounts/ : %w", readDirErr)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

activemount.go Outdated
Comment on lines 56 to 58
} else if closeErr != nil {
return false, fmt.Errorf("active mount file %s has been opened but cannot be closed %w", activemountFilePath, readErr)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing to close a file is bad enough to emit a warning but I don't think it's bad enough to abort the operation (if we read the contents successfully, there's no problem to continue). I'd do something like:

if err := file.Close(); err != nil {
    log.Warningf("Failed to close active mount file: %v", err)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced everything with a os.ReadFile

activemount.go Outdated
Comment on lines 48 to 53
file, err := os.Open(activemountFilePath)

if err == nil {
// The file can exist from a previous mount when doing a docker cp on an already mounted container, no need to mount the filesystem again
payload, readErr := io.ReadAll(file)
closeErr := file.Close()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed we open the file and then on success just read all contents of it and close it. Looks just like os.ReadFile. Can we use it here?

Fun fact: Go's standard library ignores the error returned by file.Close() inside os.ReadFile

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced everything with a os.ReadFile.

I have never seen a close() function fail, but it's true that I have worked more on Windows than Linux.

Comment on lines +59 to +60
unmarshalErr := json.Unmarshal(payload, &activeMountInfo)
if unmarshalErr != nil {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal but looks a bit too verbose to me. Could be

if err := json.Unmarshal(...); err != nil {
    do stuff
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have strong feelings about it I'll keep it the way it is with a single instruction per line, I find it better specially when debugging.

activemount.go Outdated
// Parameters:
//
// requestName: Name of the volume to be mounted
// requestID: ID of the container requesting the mount
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factual error: reuqestID is not a container id (which you see, e.g., in docker container ls). In fact, it is some id generated for the pair {container, volume}. The doc doesn't necessarily need to be this specific, but it should not lie either.

I would call it either "a container-volume-unique ID" or, more elaborate, "an ID unique among volumes and containers but the same value if a volume is mounted for a container twice". Maybe can be phrased better.

Please, also fix the same thing in deactivateVolume

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I thought I checked it, but apparently not.

if activeMountInfo.UsageCount == 0 {
err := os.Remove(activemountFilePath)
if err != nil {
return false, fmt.Errorf("the active mount file %s could not be deleted %w, the filesystem won't be unmounted", activemountFilePath, err)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with an earlier error, indicating action ("unmounting" / "refusing to unmount")

Suggested change
return false, fmt.Errorf("the active mount file %s could not be deleted %w, the filesystem won't be unmounted", activemountFilePath, err)
return false, fmt.Errorf("the active mount file %s could not be deleted: %w. Refusing to unmount", activemountFilePath, err)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing all messages saying "the filesystem won't be unmounted" I changed the first one that said "Unmount" to "the filesystem will be unmounted (which now is a warning).

activemount.go Outdated
payload, _ := json.Marshal(activeMountInfo)
err = os.WriteFile(activemountFilePath, payload, 0o644)
if err != nil {
return false, fmt.Errorf("the active mount file %s could not be updated %w, the filesystem won't be unmounted", activemountFilePath, err)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false, fmt.Errorf("the active mount file %s could not be updated %w, the filesystem won't be unmounted", activemountFilePath, err)
return false, fmt.Errorf("the active mount file %s could not be written: %w. Refusing to unmount", activemountFilePath, err)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous.

driver.go Outdated

doMountFs, err := d.activateVolume(request.Name, request.ID, activemountsdir)
if err != nil {
log.Errorf("Error while activating the filesystem mount: %w", err)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%w is a feature of fmt.Errorf only, when you want to create an error object. As log.Errorf just prints something, rather than creating an object, it uses fmt.Sprintf under the hood, which will either silently interpret %w as %v, or will display some warning or something. In any case,

Suggested change
log.Errorf("Error while activating the filesystem mount: %w", err)
log.Errorf("Error while activating the filesystem mount: %v", err)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done this one and another one in line 250.

Comment on lines +204 to +207
if err != nil {
// The filesystem could not be mounted so undo the activateVolume call so it does not appear as if
// we are using a volume that we couln't mount. We can ignore the doUnmountFs because we know the volume
// is not mounted.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for me: I will look into it more carefully some time later

if err != nil {
log.Errorf("Error while activating the filesystem mount: %w", err)
return internalError("failed to deactivate the active mount:", err)
} else if doUnmountFs {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh-oh! From what I can see in the code of .deactivateVolume, it seems like you want this code to take actions for doUnmountFs == true even when err != nil! Am I right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I did not notice there was a case returning true + error. I have changed it to log a warning instead and return true + nil

@andergnet
Copy link
Author

I'm back!

I think I have addressed all the issues you raised, please review again.

I know it's all in a single commit, sorry, it's an old habit, and by the time I noticed I had most of the changes already done.

PS: The newline after the "Parameters" and "Return" in the docstring it's not me, the linter automatically adds it every time I save.

@kolayne
Copy link
Owner

kolayne commented Aug 21, 2024

I'm back!

Top!

I think I have addressed all the issues you raised, please review again.

Hopefully, I'll be able to by the end of the week

I know it's all in a single commit, sorry, it's an old habit, and by the time I noticed I had most of the changes already done.

No problem. This mostly matters at the beginning when I'm trying to understand what's the patch about in general. When the comments are addressed, I don't look into them one by one anyway.

@kolayne
Copy link
Owner

kolayne commented Nov 7, 2024

Hey, @andergnet!

I've found myself pretty busy with other stuff... So I don't really know when I'm going to get to merge this. That's somewhat untrivial, as I want to first merge some tests checking for some edge-case behaviors, and then merge your PR, making sure those tests pass (right now there's one scenario that is broken in master but, hopefully, with your PR, we can fix even that).

Yesterday I've made a step forward towards that, but I'm not quite there yet, and, given the experience of the last two months, I don't dare to try to estimate when I will have time to finish this.

But! I want to let you know that

  1. Most likely, I won't request any more changes from you seem to have already done the core thing, and should there be minor issues, I can fix them myself;
  2. I appreciate your attention to the project in general, and, in particular, the contribution with both the bug report and this workaround patch; and
  3. Of course, sooner or later, this will definitely be merged!

@andergnet
Copy link
Author

andergnet commented Nov 7, 2024

Hi @kolayne
Glad to hear back from you, I was wondering what happened to you since you disappeared after being quite engaged in the topic, no complains, I just hope you are doing alright (I have a job and two daughters, I understand being busy for side projects like this).

About the MR being delayed, no problem on my side, it's your project, your rules ;-). Just let you know that I have been using docker-on-top in the last few months with the patch and I haven't run into any issues.

Till next time :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docker cp not working properly
2 participants