-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS send hangs sometimes #16731
Comments
I'm facing a somehow similar issue. It occurs only on one out of 7 servers that all run the same OS (FreeBSD 14.1, OpenZFS 2.2.4), and use the same (home-made) zfs replication software. It is basically a series of The Killing the piped There are no related I'd be glad to help investigating the issue, but don't know where to look. |
@vedadkajtaz thanks! I have to correct myself, zfs-2.2.4 also contains the suspected patch. If you would be able to test openzfs without commit 6bdc725, that would help. I am running 3 boxes now with that reverted, only for a few days, without hanging zfs send. But, I definitely would need more time to declare this as a possible cause. |
@vedadkajtaz did you have a chance to build zfs userspace with 6bdc725 reverted? Then, that will need some time, but according to your experiments, in a week you'll be able to report some results. |
Hi, I haven't done anything yet regarding this, possibly/likely next week, sorry. |
@nabijaczleweli I can report that reverting the mentioned commit caused no zfs send issues on 3 FreeBSD based NAS servers for more than a week now. Can you have a look at the commit? |
It looked sound back then so it looks sound now. No-one seems to have posted a strace (or backtrace) that would indicate where these hang, that commit basically doesn't touch the actually-sending-stuff thread at all, and all the setup is deterministic AFAICT. This bug hasn't left "oh i see this sometimes". I can't evaluate data you're withholding. |
I have a hung process (with stock binary, FreeBSD 14.1, OpenZFS 2.2.4) right now. There is no
Not super helpful without debugging symbols, but it's obviously stuck in |
Would it be a terrible bother to take a backtrace, with symbols, of all the threads, so we don't have to guess what's happening? Attaching the strace-equivalent should in general be easier and tell you in which syscall each thread is stuck, but I don't really know if FreeBSD possesses this ability. |
I'll rebuild (stock, ie. |
@nabijaczleweli unfortunately, I can only add that since I am running my servers with the mentioned patch reverted, I am not facing with hung zfs processes. |
@nabijaczleweli I can report now that since I've reverted the mentioned patch, I am not experiencing the hanging issue. @vedadkajtaz what about your setup? |
It hung this morning, for the first time in a week, this is commit
|
Seems like pause() is not catching pthread cancel request. I am not really familiar with pthread internals. Howewer, even if it may be a bug in freebsd, it is now a regression in ZFS. Can we switch to pipes instead of using signals? A signal handler could send a byte to a pipe, and thread loop just receives from that pipe. Cancelling the thread could be indicated by closing the pipe's sender side. |
Hm, I don't think this is a valid race? The only way I could see this happening is:
...but then the timer should fire and the next syscall should pop the queue. So maybe step 3 also executes the thread cleanups but still fails to actually kill the thread because of the specific race state it's in? Either way, I can't construe a valid way through our code that can lead to this if the thread implementation is also valid. This can probably be trivially fixed with Can you try this diff? From 6c6faeaa81acfc5038ed440a1f89fc42f818c097 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=BD=D0=B0=D0=B1?= <[email protected]>
Date: Mon, 2 Dec 2024 15:58:53 +0100
Subject: [PATCH] FreeBSD: libzfs: send_progress_thread: use asynchronous
cancellation type asynchronous to work around libthr bug
X-Mutt-PGP: OS
See https://github.com/openzfs/zfs/issues/16731
---
lib/libzfs/libzfs_sendrecv.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/lib/libzfs/libzfs_sendrecv.c b/lib/libzfs/libzfs_sendrecv.c
index b9780720e..60e5047e9 100644
--- a/lib/libzfs/libzfs_sendrecv.c
+++ b/lib/libzfs/libzfs_sendrecv.c
@@ -975,6 +975,14 @@ send_progress_thread(void *arg)
struct tm tm;
int err;
+ /*
+ * XXX: the FreeBSD pthread implementation can get stuck
+ * in send_progress_thread_exit()'s pthread_join()
+ * and send_progress_thread()'s' pause()
+ * in the default deferred cancel mode.
+ * See https://github.com/openzfs/zfs/issues/16731. */
+ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
+
const struct sigaction signal_action =
{.sa_sigaction = send_progress_thread_act, .sa_flags = SA_SIGINFO};
struct sigevent timer_cfg =
--
2.39.5
|
Oh, you're probably running without a timer ( |
It's running without the |
So there is a case when order of calls are:
In this case (which is the most frequent, I assume), pause() is woken up by a cancel request, and that being handled. The second case:
Here, a cancellation request is queued and pending, and Am I right with this? As the symptom is very rare, it would be nice to trigger this somehow, focusing on threads/cancellation. Do you have an idea with a small code to try this? |
Howewer, I still can add that without this change, I have no hanging zfs send processes. |
AFAICT, (idk how pthreads on FreeBSD are implemented, maybe the thread is killed with a SIGKILL or whatever, but this should be invisible to the user-code pause()) A reduced version of this is sigaction(SIGUSR1, &signal_action, NULL);
for(;;) {
pause();
// time(2) + write(2), whatever
} and pthread_create(&S, NULL, send_progress_thread, NULL);
sigset_t new;
sigemptyset(&new);
sigaddset(&new, SIGUSR1);
pthread_sigmask(SIG_BLOCK, &new, old);
pthread_cancel(S);
pthread_join(S, &...); Maybe add a pthread_cleanup_push destructor. |
I dont really understand this. According to your assumption, pause() is not woken up by a pthread_cancel. Then, almost all the time we should encounter this hanging issue, as the progress thread usually gets to pause() earlier than the stream dump finishes. And then the design of the current implementation is wrong. Howewer, I think that pause() is indeed woken up, this small code works:
This works the same way on Linux and FreeBSD, the thread gets cancelled after 1 second. |
There's a distinct difference between pause() waking up and the thread being destroyed while sleeping in pause(). Your demo clearly shows that the thread does not wake up from pause, it dies while in pause():
|
@nabijaczleweli sorry, I think we simply misunderstood each other. With |
This was quick, new deadlock with the above patch, basically the same backtrace:
|
This can't not be a FreeBSD pthread bug, if only by the virtue of pthread_cancel()... not cancelling the thread. @vedadkajtaz thanks for noting that sending USR1 unsticks the process. Here's a new patch that, on FreeBSD, unconditionally starts the timer to do the unsticking: From 6e9a44b27c82318501556c1febc130e5a7437a85 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=BD=D0=B0=D0=B1?= <[email protected]>
Date: Mon, 2 Dec 2024 15:58:53 +0100
Subject: [PATCH] FreeBSD: libzfs: send_progress_thread: always start timer to
work around libthr bug
X-Mutt-PGP: OS
See https://github.com/openzfs/zfs/issues/16731#issuecomment-2511777775
---
lib/libzfs/libzfs_sendrecv.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/lib/libzfs/libzfs_sendrecv.c b/lib/libzfs/libzfs_sendrecv.c
index b9780720e..71cd1b6cd 100644
--- a/lib/libzfs/libzfs_sendrecv.c
+++ b/lib/libzfs/libzfs_sendrecv.c
@@ -988,7 +988,23 @@ send_progress_thread(void *arg)
sigaction(SIGINFO, &signal_action, NULL);
#endif
- if ((timer.desired = pa->pa_progress || pa->pa_astitle)) {
+ /*
+ * XXX: the FreeBSD pthread implementation can get stuck
+ * in send_progress_thread_exit()'s pthread_join()
+ * and send_progress_thread()'s' pause()
+ * See https://github.com/openzfs/zfs/issues/16731#issuecomment-2511777775
+ * and https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283101.
+ *
+ * We work around this by forcibly starting a timer
+ * to occasionally wake the thread up,
+ * so it services the unmaskable cancellation request 🙄
+ */
+#ifdef __FreeBSD__
+ timer.desired = B_TRUE;
+#else
+ if ((timer.desired = pa->pa_progress || pa->pa_astitle))
+#endif
+ {
if (timer_create(CLOCK_MONOTONIC, &timer_cfg, &timer.timer))
return ((void *)(uintptr_t)errno);
(void) timer_settime(timer.timer, 0, &timer_time, NULL);
--
2.39.5
There should be no functional change. (Well, if it deadlocks then it maybe sleeps for up to a second instead of exiting instantly, but that beats sleeping forever.) |
Suspected patch reverted. See openzfs/zfs#16731
Suspected patch reverted. See openzfs/zfs#16731
I've been able to reproduce the |
looks like fixed in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283101 |
Remove workaround applied in openzfs/zfs#16731
I am going to give it a try, will report back in a few days. |
…nightly tests on FreeBSD-14.2 and FreeBSD-13.4) This is a workaround for spurious hangs during zfs send/receive in ~30% of Github Action jobs on QEMU, probably caused by https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283101 via openzfs/zfs#16731 (comment) Signed-off-by: Wolfgang Hoschek <[email protected]>
@nabijaczleweli it seems that FreeBSD bugfix did help, I've not encountered a hanging zfs send for 3 weeks now, on 3 TrueNAS boxes. |
Seems fixed to me then. |
System information
Describe the problem you're observing
TrueNAS is using zettarepl to replicate zfs datasets to remote sites. During a cycle, sometimes, rarely, zfs send hangs. The symptom is that
zfs send
hangs, not sending anything to its output, is in idle state. I've applied a workaround, a simpe pipe command which reads output from zfs send and passes data through, and this command is reporting that no output is received from zfs send for minutes. Then it killszfs send
. Also, it is reporting that usually only a few thousand bytes are sent by zfs send, not more. Then, simply killing zfs send solves the problem, upon next cycle it will usually send the snapshots completely, without errors.Must note here that zfs used by TrueNAS contains this PR. I suspect this may be the source of my issue.I suspect that 6bdc725 may be the source of my issue.
Usually, I receive send errors once in a week or two, cannot reproduce, but I will now give a try without this patch, and see the difference.
Describe how to reproduce the problem
Unfortunately, cannot reproduce.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: