Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker available is lost after a while #9

Open
jstark1 opened this issue Apr 11, 2019 · 9 comments
Open

worker available is lost after a while #9

jstark1 opened this issue Apr 11, 2019 · 9 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jstark1
Copy link

jstark1 commented Apr 11, 2019

we are using mod-gearman-worker-go with OMD 2.90 (also with 3.00).
With the command check_gearman -H localhost:4730 -q worker_<hostname> -x we check the worker queues. Some workers lose the connection to their worker queue after some time.
#6 did not help in this case.
When we restart the gearman_worker, it immediately reconnects.
The normal services queues are working as expected

@sni
Copy link
Contributor

sni commented Apr 11, 2019

Does it occur with the currently nightly as well?

@jstark1
Copy link
Author

jstark1 commented Apr 12, 2019

I tried version mod_gearman_worker - version 1.1.1 (Build: 3.01.20190412-labs-edition-v1.1.1) with the same result.

@sni
Copy link
Contributor

sni commented Apr 12, 2019

anything in the worker log?

@sni sni added the need more information More information is required. label Apr 12, 2019
@jstark1
Copy link
Author

jstark1 commented Apr 15, 2019

the worker logs look normal to me. i can provide you a log per PM if you want. As far as I understand the worker creates his worker_<hostname> queue at daemon start. Is there a maintenance process for the worker-queue if it gets removed for the gearmand in case of an daemon restart or an network outage? Or does the gearmand remove queues without any workers?

i think the main problem is that under some circumstances the "Worker Available" in the worker_<hostname> changed from 1 to 0 and comes only with an Worker restart back.

From the mod-gearman docs https://github.com/sni/mod_gearman#how-to i learned to use ./check_gearman -H <job server hostname> -q worker_<worker hostname> -t 10 -s check.
With my check form above i do not use the -s check so i only check the Available worker in the queue.
I will give this command a try so the worker_queue has some work to do.

@jstark1
Copy link
Author

jstark1 commented Apr 16, 2019

i can now reproduce the Problem.
The worker starts and makes some connections to the gearmand server. I assume these are the connection to the different queues. One of the connection idles and get terminated after 1 hour trough our firewall. After another 1-2 hours the gearmand sets "Worker Available" in the worker_ queue to zero.

If i use the ./check_gearman -H <job server hostname> -q worker_<worker hostname> -t 10 -s check command, the idle connection counter is reset when the command has been run (5 minute check interval).

@sni
Copy link
Contributor

sni commented Apr 16, 2019

sounds reasonable. Thanks for the heads up. So is there anything from the worker side we could do? As far as i remember, the "old" worker renews the status worker from time to time which probably prevents this issue from happening.
I guess that would be ok for the go worker as well.

@sni sni added enhancement New feature or request help wanted Extra attention is needed and removed need more information More information is required. labels Apr 16, 2019
@dirtyren
Copy link

What is happening to me on CentOS8 is that mod_gearman go will do a denial of service on gearmand, with connections building up till it reaches 1000 open TCP connections and gearmand will not accept connections anymore.
I changed it back to the C version and this is not happening with it.
Strangely enough, on another installation I can not reproduce this problem with the same versions installed.

C Version
Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:29:22 2021
6

Go Version - keeps going UP
Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:41:41 2021
236
Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:43:27 2021
272
Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 10:22:52 2021
969

Tks.

@sni
Copy link
Contributor

sni commented Jun 21, 2021

i guess that's something else. Do the worker have a reasonable limit of connections/threads (max-worker)? You could also send a SIGUSR1 to the worker process to create a thread dump.

@dirtyren
Copy link

dirtyren commented Jun 21, 2021

No problem, will do it now, the conf is like this:

job_timeout=60
min-worker=5
max-worker=50
idle-timeout=30
max-jobs=1000
spawn-rate=1
fork_on_exec=no

`[2021-06-21 15:40:19.095][Error][mod_gearman_worker_linux.go:29] requested thread dump via signal user defined signal 1
[2021-06-21 15:40:19.096][Error][mod_gearman_worker.go:315] threaddump:
goroutine 9 [running]:
github.com/ConSol/mod-gearman-worker-go.logThreaddump()
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:311 +0x6f
github.com/ConSol/mod-gearman-worker-go.mainSignalHandler(0x9f2520, 0xc99230, 0x3)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker_linux.go:30 +0x2d5
github.com/ConSol/mod-gearman-worker-go.Worker.func1(0xc000064f60)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:89 +0x77
created by github.com/ConSol/mod-gearman-worker-go.Worker
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:85 +0x1e7

goroutine 1 [select]:
github.com/ConSol/mod-gearman-worker-go.mainLoop(0xc000136900, 0xc0000652c0, 0xc0001e9ee0, 0x0, 0x9811b8, 0x0, 0x0)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:155 +0x6a5
github.com/ConSol/mod-gearman-worker-go.Worker(0x0, 0x0)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:105 +0x279
main.main()
/root/go/src/github.com/ConSol/mod-gearman-worker-go/cmd/mod_gearman_worker/main.go:12 +0x39

goroutine 19 [chan receive]:
github.com/appscode/g2/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0xcd0dc0)
/root/go/src/github.com/appscode/g2/vendor/github.com/golang/glog/glog.go:879 +0x8b
created by github.com/appscode/g2/vendor/github.com/golang/glog.init.0
/root/go/src/github.com/appscode/g2/vendor/github.com/golang/glog/glog.go:410 +0x274

goroutine 8 [syscall]:
os/signal.signal_recv(0x9f2520)
/usr/local/go/src/runtime/sigqueue.go:147 +0x9d
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1
/usr/local/go/src/os/signal/signal.go:150 +0x45

goroutine 36 [sleep]:
time.Sleep(0xb2d05e00)
/usr/local/go/src/runtime/time.go:188 +0xbf
github.com/ConSol/mod-gearman-worker-go.mainLoop.func2(0xc0000223f0)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:147 +0x53
created by github.com/ConSol/mod-gearman-worker-go.mainLoop
/root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:141 +0x599

goroutine 20 [IO wait]:
internal/poll.runtime_pollWait(0x7ff360a3ada0, 0x72, 0x9eea40)
/usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc0001b4898, 0x72, 0x9eea00, 0xc81698, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0001b4880, 0xc0000d8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc0001b4880, 0xc0000d8000, 0x1000, 0x1000, 0x1000, 0x7ff38b0347d0, 0xc0000d8000)
/usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc0000b6030, 0xc0000d8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:182 +0x8e
bufio.(*Reader).Read(0xc0000b81e0, 0xc0000d8000, 0x1000, 0x1000, 0xc, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:213 +0x142
github.com/appscode/g2/worker.(*agent).read(0xc00006ee60, 0xc000216f00, 0x40, 0xc0000b8360, 0xc, 0x0)
/root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5
github.com/appscode/g2/worker.(*agent).work(0xc00006ee60)
/root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7
created by github.com/appscode/g2/worker.(*agent).Connect
/root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 21 [chan receive]:
github.com/appscode/g2/worker.(*Worker).Work(0xc0001b4780)
/root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6
github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc0001b4780)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d
created by github.com/ConSol/mod-gearman-worker-go.newWorker
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 50 [IO wait]:
internal/poll.runtime_pollWait(0x7ff360a3abd0, 0x72, 0x9eea40)
/usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc0000d2118, 0x72, 0x9eea00, 0xc81698, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0000d2100, 0xc000244000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc0000d2100, 0xc000244000, 0x1000, 0x1000, 0x1000, 0x7ff38b034e98, 0xc000244000)
/usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00021e000, 0xc000244000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:182 +0x8e
bufio.(*Reader).Read(0xc00020e0c0, 0xc000244000, 0x1000, 0x1000, 0xc, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:213 +0x142
github.com/appscode/g2/worker.(*agent).read(0xc0000c40f0, 0xc000036f00, 0x40, 0xc00020e480, 0xc, 0x0)
/root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5
github.com/appscode/g2/worker.(*agent).work(0xc0000c40f0)
/root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7
created by github.com/appscode/g2/worker.(*agent).Connect
/root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 51 [chan receive]:
github.com/appscode/g2/worker.(*Worker).Work(0xc0000d2000)
/root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6
github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc0000d2000)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d
created by github.com/ConSol/mod-gearman-worker-go.newWorker
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 52 [IO wait]:
internal/poll.runtime_pollWait(0x7ff360a3aae8, 0x72, 0x9eea40)
/usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc000236118, 0x72, 0x9eea00, 0xc81698, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000236100, 0xc0000da000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc000236100, 0xc0000da000, 0x1000, 0x1000, 0x1000, 0x7ff38b0347d0, 0xc0000da000)
/usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00021e010, 0xc0000da000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:182 +0x8e
bufio.(*Reader).Read(0xc00020e1e0, 0xc0000da000, 0x1000, 0x1000, 0xc, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:213 +0x142
github.com/appscode/g2/worker.(*agent).read(0xc000234050, 0xc000212f00, 0x40, 0xc0000b8420, 0xc, 0x0)
/root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5
github.com/appscode/g2/worker.(*agent).work(0xc000234050)
/root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7
created by github.com/appscode/g2/worker.(*agent).Connect
/root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 53 [chan receive]:
github.com/appscode/g2/worker.(*Worker).Work(0xc000236000)
/root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6
github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236000)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d
created by github.com/ConSol/mod-gearman-worker-go.newWorker
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 54 [IO wait]:
internal/poll.runtime_pollWait(0x7ff360a3aa00, 0x72, 0x9eea40)
/usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc000236298, 0x72, 0x9eea00, 0xc81698, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000236280, 0xc000241000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc000236280, 0xc000241000, 0x1000, 0x1000, 0x1000, 0x7ff38b034e98, 0xc000241000)
/usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00021e020, 0xc000241000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:182 +0x8e
bufio.(*Reader).Read(0xc00020e300, 0xc000241000, 0x1000, 0x1000, 0xc, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:213 +0x142
github.com/appscode/g2/worker.(*agent).read(0xc0002340f0, 0xc000217f00, 0x40, 0xc00020e420, 0xc, 0x0)
/root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5
github.com/appscode/g2/worker.(*agent).work(0xc0002340f0)
/root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7
created by github.com/appscode/g2/worker.(*agent).Connect
/root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 55 [chan receive]:
github.com/appscode/g2/worker.(*Worker).Work(0xc000236180)
/root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6
github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236180)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d
created by github.com/ConSol/mod-gearman-worker-go.newWorker
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 66 [IO wait]:
internal/poll.runtime_pollWait(0x7ff360a3a918, 0x72, 0x9eea40)
/usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc000236418, 0x72, 0x9eea00, 0xc81698, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000236400, 0xc0002a4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc000236400, 0xc0002a4000, 0x1000, 0x1000, 0x1000, 0x7ff38b035560, 0xc0002a4000)
/usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00028a000, 0xc0002a4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:182 +0x8e
bufio.(*Reader).Read(0xc0002840c0, 0xc0002a4000, 0x1000, 0x1000, 0xc, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:213 +0x142
github.com/appscode/g2/worker.(*agent).read(0xc000234190, 0xc000214f00, 0x40, 0xc000284480, 0xc, 0x0)
/root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5
github.com/appscode/g2/worker.(*agent).work(0xc000234190)
/root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7
created by github.com/appscode/g2/worker.(*agent).Connect
/root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 67 [chan receive]:
github.com/appscode/g2/worker.(*Worker).Work(0xc000236300)
/root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6
github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236300)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d
created by github.com/ConSol/mod-gearman-worker-go.newWorker
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 68 [IO wait]:
internal/poll.runtime_pollWait(0x7ff360a3a830, 0x72, 0x9eea40)
/usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc00029a118, 0x72, 0x9eea00, 0xc81698, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00029a100, 0xc0002a5000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc00029a100, 0xc0002a5000, 0x1000, 0x1000, 0x1000, 0x7ff38b035560, 0xc0002a5000)
/usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00028a010, 0xc0002a5000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:182 +0x8e
bufio.(*Reader).Read(0xc000284240, 0xc0002a5000, 0x1000, 0x1000, 0xc, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:213 +0x142
github.com/appscode/g2/worker.(*agent).read(0xc000296050, 0xc000228700, 0x40, 0xc0002844e0, 0xc, 0x0)
/root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5
github.com/appscode/g2/worker.(*agent).work(0xc000296050)
/root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7
created by github.com/appscode/g2/worker.(*agent).Connect
/root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 69 [chan receive]:
github.com/appscode/g2/worker.(*Worker).Work(0xc00029a000)
/root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6
github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc00029a000)
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d
created by github.com/ConSol/mod-gearman-worker-go.newWorker
/root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants