Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InterProcessLock files left behind #26

Open
dstufft opened this issue May 24, 2016 · 27 comments
Open

InterProcessLock files left behind #26

dstufft opened this issue May 24, 2016 · 27 comments

Comments

@dstufft
Copy link

dstufft commented May 24, 2016

When trying to use the InterProcessLock class on OS X it appears that when it releases the lock, it doesn't cleanup after itself. It would be great if it could do this to ensure that we don't leave a bunch of left over files laying around.

@harlowja
Copy link
Owner

harlowja commented Aug 14, 2016

Yup, known issue with this kind of lock type using the lock scheme we are using.

See the following for some of this:

That thread is split across 2 months; and it summarizes the problem and why its not 'just as easy' as u might think ;)

If someone wants to develop a solution though, I'm more than willing to review it.

@harlowja
Copy link
Owner

Do note that if the python flock API provided a little more functionality, we might be able to do some work there to get these cleaned up, because in general it appears quite hard to know which locks to even delete without trying to take over the ownership of those locks (and associated blocking doing that). I believe there is a non-exposed API for flock usage that does let u 'get the current owner' (that at least could save some time during any cleanup process).

@tgamblin
Copy link

I was able to find a decent solution for this for my use case by using byte range locking with Python's lockf (which is confusingly fcntl underneath).

In particular, I have a bunch of install prefixes for different packages in a package manager, and each needs to be write-locked while an install is happening and read-locked while a dependency's install is happening. Lots of installs can happen in parallel.

I use a single lock file, and for each directory, I lock a single byte in the lock file identified by the 63-bit prefix of the SHA-1 of the directory name.

Yes, that is complicated, but it has some advantages:

  1. Uses fcntl locks, which seems to be reasonably supported across distributed filesystems I care about (NFS v2.6+, Lustre, GPFS).
  2. Only requires one lock file, whose actual size is zero since you can lock past the end of the file.
  3. No need to clean up the one lock file (cleaning up fcntl locks gives you race conditions anyway).
  4. Interprocess reader-writer locks.

Disadvantages:

  1. This approach is not "perfect", in the sense that it can result in false positives -- if there is a hash collision you may have to wait for an install to finish when you do not need to wait. But the likelihood of a collision with two locking processes and 64-bit offsets (most machines) is 1e-19, and the likelihood of a collision doesn't reach 1% until you have ~430 million processes (at which point you probably have other problems).
  2. I think if you adapted this to also support msvcrt on Windows, you'd have to make all the locks exclusive, or maybe unix clients could use the read locks and they would appear to be exclusive locks to a windows client. I haven't looked into how different NFS implementations handle this.

At least for my use case, the advantages outweigh the disadvantages, and I couldn't come up with a better solution that didn't require me to deploy a distributed lock server or some other coordination service.

Anyway, an interprocess reader-writer lock with byte range support is implemented here:

There is some relevant discussion here. The SHA stuff is outside the lock class, so it probably doesn't need to go in fasteners, unless you like the idea and want some kind of lock_by_name_sha1() method.

Just wanted to throw a potential solution in the thread. Let me know if you'd have a use for this in fasteners.

@tgamblin
Copy link

@harlowja

@harlowja
Copy link
Owner

I also had something similar @ #10 (though yours might be better)

@harlowja
Copy link
Owner

Ya, nice yours is better :)

@tgamblin
Copy link

@harlowja: thanks! curious if openstack and/or fasteners has a use for this. If so I could try to make an API and submit a PR. Not sure if I can maintain for cloud environments, though, so someone in your project would have to have a need. Thoughts?

@harlowja
Copy link
Owner

Sure, I take PRs :)

My guess is someone would have a need :)

@harlowja
Copy link
Owner

The read and write lock stuff could be especially useful.

@corydodt
Copy link

Issue is now almost 2 years old and I, a new user, am running into it now.

One key thing I'd like to address: Your documentation specifically says this,

Inter-process locks
Single writer using file based locking (these automatically release on process exit, even if release or exit is never called).

However this doesn't seem to be the case, and if I'm understanding this bug correctly, they are the same issue. Specifically, SIGTERM'ing a process (let alone SIGQUIT or SIGKILL) leave the lockfile in the filesystem.

  1. Is this in fact the same issue being described in this github issue?
  2. If so, can you update the documentation to remove that wording, and maybe link to this issue as the path to fixing it? In my view, the documentation says something completely the opposite of what's true.

@harlowja
Copy link
Owner

So I think you are confused that release is the same as file deletion; it's not. What is released is the file handle that is owned by the process; this is automatically released (ensured by the operating system). To actually delete the file is actually pretty complicated to get right (due to dual-ownership); but I do agree the wording could be better.

@vstinner
Copy link

Would it make sense to use a file descriptor rather than a filename with O_TMPFILE, to ensure that the file is destroyed as soon at the file is closed in all processes (or when processes are killed)? O_TMPFILE requires Linux 3.11 and newer, and is not supported by all filesystems, but more and more filesystems support it.

See how tempfile of Python stdlib uses O_TMPFILE:

I just tested on Fedora 29: btrfs, ext4 and tmpfs support O_TMPFILE. (I only tested these filesystems.)

$ cat /etc/fedora-release 
Fedora release 29 (Twenty Nine)

$ uname -r
4.18.16-300.fc29.x86_64

$ python3
Python 3.7.1 (default, Nov  5 2018, 14:07:04) 
>>> import os

>>> fd=os.open(".", os.O_WRONLY | os.O_TMPFILE) # brtfs
>>> os.close(fd)

>>> fd=os.open("/tmp", os.O_WRONLY | os.O_TMPFILE) # tmpfs
>>> os.close(fd)

>>> fd=os.open("/boot/test", os.O_WRONLY | os.O_TMPFILE) # ext4
>>> os.close(fd)

Problem: who create the FD? How to pass the FD to other processes? UNIX socket with sendmsg()? ...

@harlowja
Copy link
Owner

Oh hi victor, how are u :)

@4383
Copy link

4383 commented Nov 20, 2018

Hi @harlowja what do you think about Victor suggestions?
I starting to review your PR #10 and I currently realize some tests with it.

@vstinner
Copy link

Oh hi victor, how are u :)

Hey! I'm fine. I moved to a new team. I'm now maintaining Python for Red Hat (in Fedora, RHEL and upstream)! But as you can see, I'm helping @4383 who replaced me on Oslo in my previous OpenStack team.

@harlowja
Copy link
Owner

harlowja commented Dec 1, 2018

Hi guys, just back from vacation (thanksgiving in the US) so it will take me a little while to catch all back up..

@harlowja
Copy link
Owner

harlowja commented Dec 4, 2018

Let's do what victor suggests. I can't see any flaws with it (outside of the mentioned ones).

@4383
Copy link

4383 commented Dec 13, 2018

@harlowja do not hesitate to tell if I can help you by doing something (review, code, whatever), just ping me.

@harlowja
Copy link
Owner

Do you want to try to implement #26 (comment)???

@4383
Copy link

4383 commented Dec 14, 2018

@harlowja : I'm a really new comer on fasteners so in the case I implement the victor comment I need to be mentored by you to do not spend a lot of time to understand fasteners architecture, and to understand to integrate that properly with #10 .
Do you agree?

@tgamblin
Copy link

@harlowja: Sorry for disappearing. I initially had issues getting a license change through our IP office when I commented, but Spack has since relicensed to Apache-2.0/MIT, so we could contribute our code pretty easily now.

If you want the byte range locking that I mentioned above, that could make sense here. We use it all the time so I believe it is pretty well tested. It doesn't directly solve this problem unless you use it the way we described above (by mapping locks to byte indexes via hashing), so it probably needs another API on top, but our approach is POSIX, so there is that.

@vstinner's idea is more general if the OS and filesystem support it.

@harlowja
Copy link
Owner

Might be easier if you guys want to jump on toolsforhumans.slack.com or IRC?

@harlowja
Copy link
Owner

It'd be nice to look at https://github.com/spack/spack/blob/develop/lib/spack/llnl/util/lock.py again, and see what we can take into this lib.

@tgamblin
Copy link

@harlowja: how do I get an account?

@harlowja
Copy link
Owner

Ah, hmmm, good question, lol

@harlowja
Copy link
Owner

Guess I have to inivite people, hmmm

@Erotemic
Copy link
Contributor

@harlowja It doesn't look like the spack lock implementation does the cleanup the lockfile either. Am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants