-
Notifications
You must be signed in to change notification settings - Fork 815
Fix concurrency races, ghost children, and goroutine lifecycle issues #1941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Add lock-and-copy pattern in Receiver.Input to prevent data races - Protect all Node parent/child operations with proper mutex locking - Add thread-safe HasSenders() method Eliminates data races.
- Add 'closed' flag and cleanGhostChildren() - Prevent adding children to closed parents - Clear Tapo sender reference on Stop() Fixes memory leaks and stuck producers.
- Add context.Context to Tapo client for graceful cancellation - Make Handle() loop cancellable with context checks - Implement stop channel in Producer worker Fixes stuck Tapo connections and goroutine leaks.
|
Just to add a bit more context, I've been working on this in the past two weeks and have been running a build with the final commits for several days with no issues. These bugs were not easy to troubleshoot (my test builds had tons of extra trace logs), but I could pretty easily create stuck children by rapidly enabling/disabling two-way audio in Frigate for my Tapo cameras, eg. switching back-and forth between the tapo protocol and rtsp. This PR would likely fix strange issues like #1461, #1933, #1611. Data races can also be checked with go run -race. There are quite a few, but I've been focusing on the core and streams packages, as well as Tapo (since I have Tapo stuff). Cherry picking a few examples: |
|
This is a very complex and confusing part of go2rtc. I myself am confused about it and plan to rewrite it in the future. |
|
From an outsider perspective, these look to be fairly safe changes to merge and could potentially solve some odd corner case issues (like the ones mentioned by the author). I have a mix of RTSP cameras that exhibit weird issues with two way audio sometimes killing streams, and I suspect that they may fix that. Speaking from my own dev experience, a rewrite of a portion of code that has issues is always a good thing, but its also good to address issues with the existing code that impacts users in the meantime. At the very least, it could reduce the immediate need for the improvements resulting from a rewrite. |
|
Thanks for the feedback. I've updated the PR to resolve conflicts. |
Summary
This PR fixes critical concurrency issues, memory leaks, and goroutine lifecycle problems in core packet forwarding and Tapo client implementation.
Problems Fixed
1. Data Races in Packet Processing
Receiver.Inputiterated over children without lock while other threads modified the slice2. Ghost Children Accumulation
cleanGhostChildren(), closed state tracking3. Orphaned Senders
4. Uncancellable Goroutines
See screenshot below for stuck Tapo streams that kept sending data without any consumers:
Changes
Commit 1: Core Concurrency Fixes
pkg/core/track.go: Lock-and-copy inReceiver.Input, thread-safeHasSenders()pkg/core/node.go: Protected parent/child access with locksinternal/streams/stream.go: Use new thread-safeHasSenders()Commit 2: Ghost Children Prevention
pkg/core/node.go: Addclosedflag,cleanGhostChildren(), prevent append to closed parentpkg/core/track.go: Call ghost cleanup inHasSenders()pkg/tapo/producer.go: Clear sender referenceCommit 3: Goroutine Lifecycle Management
pkg/tapo/client.go: Add context cancellation, monitor goroutine, safe Close()pkg/tapo/backchannel.go: Context checks before operationsinternal/streams/producer.go: Stop channel for worker termination