InterProcessLock files left behind #26

dstufft · 2016-05-24T12:38:09Z

When trying to use the InterProcessLock class on OS X it appears that when it releases the lock, it doesn't cleanup after itself. It would be great if it could do this to ensure that we don't leave a bunch of left over files laying around.

The text was updated successfully, but these errors were encountered:

harlowja · 2016-08-14T03:28:47Z

Yup, known issue with this kind of lock type using the lock scheme we are using.

See the following for some of this:

That thread is split across 2 months; and it summarizes the problem and why its not 'just as easy' as u might think ;)

If someone wants to develop a solution though, I'm more than willing to review it.

harlowja · 2016-08-14T03:32:11Z

Do note that if the python flock API provided a little more functionality, we might be able to do some work there to get these cleaned up, because in general it appears quite hard to know which locks to even delete without trying to take over the ownership of those locks (and associated blocking doing that). I believe there is a non-exposed API for flock usage that does let u 'get the current owner' (that at least could save some time during any cleanup process).

tgamblin · 2016-10-13T15:15:38Z

I was able to find a decent solution for this for my use case by using byte range locking with Python's lockf (which is confusingly fcntl underneath).

In particular, I have a bunch of install prefixes for different packages in a package manager, and each needs to be write-locked while an install is happening and read-locked while a dependency's install is happening. Lots of installs can happen in parallel.

I use a single lock file, and for each directory, I lock a single byte in the lock file identified by the 63-bit prefix of the SHA-1 of the directory name.

Yes, that is complicated, but it has some advantages:

Uses fcntl locks, which seems to be reasonably supported across distributed filesystems I care about (NFS v2.6+, Lustre, GPFS).
Only requires one lock file, whose actual size is zero since you can lock past the end of the file.
No need to clean up the one lock file (cleaning up fcntl locks gives you race conditions anyway).
Interprocess reader-writer locks.

Disadvantages:

This approach is not "perfect", in the sense that it can result in false positives -- if there is a hash collision you may have to wait for an install to finish when you do not need to wait. But the likelihood of a collision with two locking processes and 64-bit offsets (most machines) is 1e-19, and the likelihood of a collision doesn't reach 1% until you have ~430 million processes (at which point you probably have other problems).
I think if you adapted this to also support msvcrt on Windows, you'd have to make all the locks exclusive, or maybe unix clients could use the read locks and they would appear to be exclusive locks to a windows client. I haven't looked into how different NFS implementations handle this.

At least for my use case, the advantages outweigh the disadvantages, and I couldn't come up with a better solution that didn't require me to deploy a distributed lock server or some other coordination service.

Anyway, an interprocess reader-writer lock with byte range support is implemented here:

There is some relevant discussion here. The SHA stuff is outside the lock class, so it probably doesn't need to go in fasteners, unless you like the idea and want some kind of lock_by_name_sha1() method.

Just wanted to throw a potential solution in the thread. Let me know if you'd have a use for this in fasteners.

tgamblin · 2017-02-16T01:21:51Z

@harlowja

harlowja · 2017-02-16T05:34:48Z

I also had something similar @ #10 (though yours might be better)

harlowja · 2017-02-16T05:37:39Z

Ya, nice yours is better :)

tgamblin · 2017-02-16T06:59:05Z

@harlowja: thanks! curious if openstack and/or fasteners has a use for this. If so I could try to make an API and submit a PR. Not sure if I can maintain for cloud environments, though, so someone in your project would have to have a need. Thoughts?

harlowja · 2017-02-25T07:43:24Z

Sure, I take PRs :)

My guess is someone would have a need :)

harlowja · 2017-02-27T00:09:21Z

The read and write lock stuff could be especially useful.

corydodt · 2018-03-22T17:16:11Z

Issue is now almost 2 years old and I, a new user, am running into it now.

One key thing I'd like to address: Your documentation specifically says this,

Inter-process locks
Single writer using file based locking (these automatically release on process exit, even if release or exit is never called).

However this doesn't seem to be the case, and if I'm understanding this bug correctly, they are the same issue. Specifically, SIGTERM'ing a process (let alone SIGQUIT or SIGKILL) leave the lockfile in the filesystem.

Is this in fact the same issue being described in this github issue?
If so, can you update the documentation to remove that wording, and maybe link to this issue as the path to fixing it? In my view, the documentation says something completely the opposite of what's true.

harlowja · 2018-03-23T16:44:58Z

So I think you are confused that release is the same as file deletion; it's not. What is released is the file handle that is owned by the process; this is automatically released (ensured by the operating system). To actually delete the file is actually pretty complicated to get right (due to dual-ownership); but I do agree the wording could be better.

vstinner · 2018-11-15T15:33:03Z

Would it make sense to use a file descriptor rather than a filename with O_TMPFILE, to ensure that the file is destroyed as soon at the file is closed in all processes (or when processes are killed)? O_TMPFILE requires Linux 3.11 and newer, and is not supported by all filesystems, but more and more filesystems support it.

See how tempfile of Python stdlib uses O_TMPFILE:

I just tested on Fedora 29: btrfs, ext4 and tmpfs support O_TMPFILE. (I only tested these filesystems.)

$ cat /etc/fedora-release 
Fedora release 29 (Twenty Nine)

$ uname -r
4.18.16-300.fc29.x86_64

$ python3
Python 3.7.1 (default, Nov  5 2018, 14:07:04) 
>>> import os

>>> fd=os.open(".", os.O_WRONLY | os.O_TMPFILE) # brtfs
>>> os.close(fd)

>>> fd=os.open("/tmp", os.O_WRONLY | os.O_TMPFILE) # tmpfs
>>> os.close(fd)

>>> fd=os.open("/boot/test", os.O_WRONLY | os.O_TMPFILE) # ext4
>>> os.close(fd)

Problem: who create the FD? How to pass the FD to other processes? UNIX socket with sendmsg()? ...

harlowja · 2018-11-15T18:15:46Z

Oh hi victor, how are u :)

4383 · 2018-11-20T09:52:56Z

Hi @harlowja what do you think about Victor suggestions?
I starting to review your PR #10 and I currently realize some tests with it.

vstinner · 2018-11-21T18:13:40Z

Oh hi victor, how are u :)

Hey! I'm fine. I moved to a new team. I'm now maintaining Python for Red Hat (in Fedora, RHEL and upstream)! But as you can see, I'm helping @4383 who replaced me on Oslo in my previous OpenStack team.

harlowja · 2018-12-01T16:41:13Z

Hi guys, just back from vacation (thanksgiving in the US) so it will take me a little while to catch all back up..

harlowja · 2018-12-04T18:24:54Z

Let's do what victor suggests. I can't see any flaws with it (outside of the mentioned ones).

4383 · 2018-12-13T14:36:40Z

@harlowja do not hesitate to tell if I can help you by doing something (review, code, whatever), just ping me.

harlowja · 2018-12-14T15:56:47Z

Do you want to try to implement #26 (comment)???

4383 · 2018-12-14T16:23:10Z

@harlowja : I'm a really new comer on fasteners so in the case I implement the victor comment I need to be mentored by you to do not spend a lot of time to understand fasteners architecture, and to understand to integrate that properly with #10 .
Do you agree?

tgamblin · 2018-12-14T16:56:19Z

@harlowja: Sorry for disappearing. I initially had issues getting a license change through our IP office when I commented, but Spack has since relicensed to Apache-2.0/MIT, so we could contribute our code pretty easily now.

If you want the byte range locking that I mentioned above, that could make sense here. We use it all the time so I believe it is pretty well tested. It doesn't directly solve this problem unless you use it the way we described above (by mapping locks to byte indexes via hashing), so it probably needs another API on top, but our approach is POSIX, so there is that.

@vstinner's idea is more general if the OS and filesystem support it.

harlowja · 2018-12-19T18:34:06Z

Might be easier if you guys want to jump on toolsforhumans.slack.com or IRC?

harlowja · 2018-12-19T19:06:17Z

It'd be nice to look at https://github.com/spack/spack/blob/develop/lib/spack/llnl/util/lock.py again, and see what we can take into this lib.

tgamblin · 2018-12-19T21:39:42Z

@harlowja: how do I get an account?

harlowja · 2018-12-20T22:57:56Z

Ah, hmmm, good question, lol

harlowja · 2018-12-20T22:58:30Z

Guess I have to inivite people, hmmm

Erotemic · 2019-02-13T23:16:11Z

@harlowja It doesn't look like the spack lock implementation does the cleanup the lockfile either. Am I missing something?

tgamblin mentioned this issue Nov 7, 2016

Document Locking in Spack spack/spack#2267

Closed

harlowja mentioned this issue Mar 7, 2017

Allow for using and working with offset locks #10

Open

citibeth mentioned this issue Apr 3, 2017

Spack Database spack/spack#3677

Closed

tgamblin mentioned this issue Apr 28, 2017

Move llnl.util package to spack.util spack/spack#4021

Closed

tgamblin mentioned this issue Aug 28, 2017

spack locking system does not make sense spack/spack#5221

Closed

This was referenced Feb 17, 2021

Avoid concurrency issues with load_metadata function conan-io/conan#8517

Closed

[WIP] Locks + cache for multiple revisions conan-io/conan#8510

Closed

ChrisCummins mentioned this issue Apr 23, 2021

Use "." name prefix for hidden lockfiles facebookresearch/CompilerGym#209

Closed

nritsche mentioned this issue May 13, 2021

feat(memh5): zarr support radiocosmology/caput#169

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InterProcessLock files left behind #26

InterProcessLock files left behind #26

dstufft commented May 24, 2016

harlowja commented Aug 14, 2016 •

edited

Loading

harlowja commented Aug 14, 2016

tgamblin commented Oct 13, 2016

tgamblin commented Feb 16, 2017

harlowja commented Feb 16, 2017

harlowja commented Feb 16, 2017

tgamblin commented Feb 16, 2017

harlowja commented Feb 25, 2017

harlowja commented Feb 27, 2017

corydodt commented Mar 22, 2018

harlowja commented Mar 23, 2018

vstinner commented Nov 15, 2018

harlowja commented Nov 15, 2018

4383 commented Nov 20, 2018

vstinner commented Nov 21, 2018

harlowja commented Dec 1, 2018

harlowja commented Dec 4, 2018

4383 commented Dec 13, 2018

harlowja commented Dec 14, 2018

4383 commented Dec 14, 2018

tgamblin commented Dec 14, 2018

harlowja commented Dec 19, 2018

harlowja commented Dec 19, 2018

tgamblin commented Dec 19, 2018

harlowja commented Dec 20, 2018

harlowja commented Dec 20, 2018

Erotemic commented Feb 13, 2019

InterProcessLock files left behind #26

InterProcessLock files left behind #26

Comments

dstufft commented May 24, 2016

harlowja commented Aug 14, 2016 • edited Loading

harlowja commented Aug 14, 2016

tgamblin commented Oct 13, 2016

tgamblin commented Feb 16, 2017

harlowja commented Feb 16, 2017

harlowja commented Feb 16, 2017

tgamblin commented Feb 16, 2017

harlowja commented Feb 25, 2017

harlowja commented Feb 27, 2017

corydodt commented Mar 22, 2018

harlowja commented Mar 23, 2018

vstinner commented Nov 15, 2018

harlowja commented Nov 15, 2018

4383 commented Nov 20, 2018

vstinner commented Nov 21, 2018

harlowja commented Dec 1, 2018

harlowja commented Dec 4, 2018

4383 commented Dec 13, 2018

harlowja commented Dec 14, 2018

4383 commented Dec 14, 2018

tgamblin commented Dec 14, 2018

harlowja commented Dec 19, 2018

harlowja commented Dec 19, 2018

tgamblin commented Dec 19, 2018

harlowja commented Dec 20, 2018

harlowja commented Dec 20, 2018

Erotemic commented Feb 13, 2019

harlowja commented Aug 14, 2016 •

edited

Loading