-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix macOS segfault on close after hardware error #211
Open
MrDOS
wants to merge
2
commits into
NeuronRobotics:master
Choose a base branch
from
MrDOS:bugfix/fix-macos-segfault-on-close-after-hardware-error
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Fix macOS segfault on close after hardware error #211
MrDOS
wants to merge
2
commits into
NeuronRobotics:master
from
MrDOS:bugfix/fix-macos-segfault-on-close-after-hardware-error
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Just cleaning up the event info struct isn't enough: one some platforms, there's a “drain loop” thread which runs underneath the monitor thread to watch for an empty output buffer. This thread needs to be stopped, too, or it'll segfault and bring down the whole JVM after the event info struct is freed.
The old `MonitorThreadLock` boolean field was only checked at a very slow interval (5s!), and, itself not being synchronized, was prone to race conditions. More importantly, it wasn't set during the monitor thread's self-cleanup after hardware failure, so under typical access patterns, the monitor thread and the application thread would both try to clean up the monitor thread simultaneously. This race condition could occasionally lead to a segfault (only reproduced on macOS, but I've no doubt it could happen elsewhere). I've also attempted to clean up some redundant flag fields, and consolidate setting the remaining fields to further avoid concurrency/reentrancy issues.
This was referenced Feb 1, 2021
I can confirm this fixes the behaviour i was seeing on macOS. Thanks! |
Great! I have not yet tried this build on any platform other than macOS, nor have I tested behaviour other than failures (have not even run any data through it), so I want to do some more QC before merging. But thank you for trying it so quickly. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes the JVM crash experienced on macOS after a hardware error occurs. The crash was introduced along with the core hardware-error-detection functionality in #172.
In order to generate the
OUTPUT_BUFFER_EMPTY
event, the monitor thread event loop periodically checks the status of the serial port output buffer, just as it checks for other state changes. On platforms which have it, and on Windows where it's emulated, it uses theTIOCSERGETLSR
ioctl to get the information. On platforms which don't have that ioctl (macOS and FreeBSD <10), the monitor thread starts up another “drain thread” under the monitor thread which loops ontcdrain(3)
, and generates the event whenever that call unblocks. That separate drain thread is usually terminated byRXTXPort.interruptEventListener()
. However, because the monitor thread event loop now internally terminates upon encountering a hardware error, the drain thread cleanup in that interruption method isn't being performed. The fix is to have the event loop call the interruption method when it detects a hardware failure. That ensures all of the normal monitor thread cleanup happens.However, doing that revealed another, rare race condition where the application thread could encounter the port failure and invoke
RXTXPort.close()
before the monitor thread began its shutdown (e.g., if the application is sitting on a tight loop aroundavailable()
, as in the case of myDisconnectTest
test case). This manifested in another segfault. The obvious solution was some locking around the internal state of the monitor thread. This already kind of existed in the form of theRXTXPort.MonitorThreadLock
boolean and theRXTXPort.waitForTheNativeCodeSilly()
method which waited for the flag to become unset in a 5-second sleep loop. I took this as an opportunity to replace that nonsense with a real lock. I chose a read/write lock rather than justsynchronized
blocks around an object because it more closely matches the semantics of the old behaviour: all of the I/O functions checked that the flag was clear, which is akin to sharing a read lock, and all of the notification reconfiguration methods set/cleared the flag, which is akin to an exclusive write lock.This should fix the issue @d5smith raised in #197. It probably doesn't fix the original hang-on-
close()
/disconnect()
issue @mvalla is facing, unless the new locking has accidentally cleaned up another concurrency issue.