Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emissary Ingress Chart v3.9.1 shutdown with unreal memory usage 'bug' #5819

Open
mnogueiraops opened this issue Feb 11, 2025 · 3 comments
Open
Labels
t:bug Something isn't working

Comments

@mnogueiraops
Copy link

No significant changes were made to this Emissary Ingress. It was working for like 60 days until we got a restart into the namespace. Now i got this:

"│ ambassador time="2025-02-11 03:45:39.5273" level=info msg="Memory Usage 541.58Gi (138644%)\n PID 1, 0.08Gi: busyambassador entrypoint \n PID 35, 0.04Gi: /usr/bin/python /usr/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --no │"

│ ambassador time="2025-02-11 03:45:39.5273" level=info msg="Memory Usage 541.58Gi (138644%)\n PID 1, 0.08Gi: busyambassador entrypoint \n PID 35, 0.04Gi: /usr/bin/python /usr/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --no ││ ambassador time="2025-02-11 03:45:39.5290" level=info msg="finished successfully: exit status 0" func="github.com/datawire/dlib/dexec.(*Cmd).Wait" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:255" CMD=entrypoint PID=1 THREAD=/envoy dexec.pid=47 ││ ambassador [2025-02-11 03:45:39 +0000] [36] [INFO] Worker exiting (pid: 36) ││ ambassador [2025-02-11 03:45:39 +0000] [35] [INFO] Shutting down: Master ││ ambassador time="2025-02-11 03:45:39.7692" level=info msg="finished successfully: exit status 0" func="github.com/datawire/dlib/dexec.(*Cmd).Wait" file="/go/vendor/github.com/datawire/dlib/dexec/cmd.go:255" CMD=entrypoint PID=1 THREAD=/diagd dexec.pid=35 ││ ambassador time="2025-02-11 03:45:39.7693" level=info msg=" final goroutine statuses:" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:84" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7694" level=info msg=" /ambex : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7694" level=info msg=" /diagd : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7695" level=info msg=" /envoy : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7695" level=info msg=" /external_snapshot_server: exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7695" level=info msg=" /healthchecks : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7695" level=info msg=" /memory : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7696" level=info msg=" /snapshot_server : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7696" level=info msg=" /watcher : exited" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7696" level=info msg=" :signal_handler:0 : exited with error" func=github.com/datawire/dlib/dgroup.logGoroutineStatuses file="/go/vendor/github.com/datawire/dlib/dgroup/group.go:95" CMD=entrypoint PID=1 THREAD=":shutdown_status" ││ ambassador time="2025-02-11 03:45:39.7696" level=error msg="shut down with error error: received signal terminated (triggering graceful shutdown)" func=github.com/emissary-ingress/emissary/v3/pkg/busy.Main file="/go/pkg/busy/busy.go:87" CMD=entrypoint PID=1 │

Any tips?

@dosubot dosubot bot added the t:bug Something isn't working label Feb 11, 2025
@dgaffuri
Copy link

We too, suddenly and without any change to configuration:

time="2025-02-24 14:45:34.9227" level=warning msg="Memory Usage: throttling reconfig v1 due to constrained memory with 1 stale reconfigs (1 max)" func=github.com/emissary-ingress/emissary/v3/pkg/ambex.updaterWithTicker file="/go/pkg/ambex/ratelimit.go:140" CMD=entrypoint PID=1 THREAD=/ambex/updater
time="2025-02-24 14:45:42.6553" level=info msg="Memory Usage 447.70Gi (114611%)\n PID 1, 0.08Gi: busyambassador entrypoint \n PID 13, 0.04Gi: /usr/bin/python /usr/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --notices /ambassador/notices.json --port 8004 --kick kill -HUP 1 \n PID 15, 0.04Gi: /usr/bin/python /usr/bin/diagd /ambassador/snapshots /ambassador/bootstrap-ads.json /ambassador/envoy/envoy.json --notices /ambassador/notices.json --port 8004 --kick kill -HUP 1 \n PID 26, 0.06Gi: envoy -c /ambassador/bootstrap-ads.json --base-id 0 --drain-time-s 600 -l error " func="github.com/emissary-ingress/emissary/v3/pkg/memory.(*MemoryUsage).Watch.func1" file="/go/pkg/memory/memory.go:39" CMD=entrypoint PID=1 THREAD=/memory

@dgaffuri
Copy link

dgaffuri commented Feb 26, 2025

@mnogueiraops I don't know your environment, but I've had this problem using OKE in Oracle Cloud. I've found that it depends on the operating system and/or containerd version, cordoning recently updated nodes it works again

10.197.13.242 Ready node 25d v1.31.1 10.197.13.242 Oracle Linux Server 8.10 5.15.0-210.163.7.el8uek.x86_64 cri-o://1.31.3-2.a553be57b4d.el8
10.197.14.156 Ready,SchedulingDisabled node 4d16h v1.31.1 10.197.14.156 Oracle Linux Server 8.10 5.15.0-302.167.6.el8uek.x86_64 cri-o://1.31.3-2.a553be57b4d.el8

The OCID of the working node is ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7c4hexum4lbjpw2k3lt3ukn5j2czsqriuk3ufn5vas7klcvyghza

@mnogueiraops
Copy link
Author

Thank you, @dgaffuri

We do indeed use OKE. I will test the proposed step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants