Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSR-BRS crashes after varying amount of time #371

Open
janhenhan opened this issue Jul 16, 2023 · 2 comments
Open

SSR-BRS crashes after varying amount of time #371

janhenhan opened this issue Jul 16, 2023 · 2 comments

Comments

@janhenhan
Copy link

janhenhan commented Jul 16, 2023

Hi all,

Great to see you guys are still going strong developing SSR after over a decade. Congrats on the 0.6 release!

Recently, I've increased the number of network messages I send to ssr-brs (As an example, let's say 20 sources each get messages updating some of their attributes at 100 Hz update rate). Unfortunately that came with a big decrease of stability of the ssr.

I am experiencing some unexpected crashes after varying amounts of time - sometimes it runs fine for hours, other times only minutes. At first I thought this is maybe the older FUDI interface's fault (seeing some open issues here describing similar crashes using the older network interface), so I switched over to using the more recent websocket interface. Unfortunately, same problem with crashes there. The messages I send all seem to contain values within a valid range, i.e. it is no particular message that crashes ssr-brs as far as I can tell.

I attached the process to lldb, however the messages mean very little to me - most of the time it is a bad access in the cleanup:
"
Process 45934 stopped

  • thread # 13, stop reason = EXC_BAD_ACCESS (code=1, address=0xbeadde8ca818)
    frame # 0: 0x000000010000d2e4 ssr-brs`apf::CommandQueue::push(apf::CommandQueue::Command*) [inlined] apf::CommandQueue::_cleanup(this=0x0000000100110588, cmd=0x000060000021a800) at commandqueue.h:173:12 [opt]
    170 void _cleanup(Command* cmd)
    171 {
    172 assert(cmd != nullptr);
    -> 173 cmd->cleanup();
    174 delete cmd;
    175 }
    176
    Target 0: (ssr-brs) stopped.
    "

Any thoughts on what this means or how I could prevent it, to get ssr-brs to a more robust state again? These bad_accesses happen somewhere in APF? Any other logs that would help? I'm on a M1 Mac.

Many thanks!

@mgeier
Copy link
Member

mgeier commented Jul 19, 2023

Thanks for the report!

This sounds like a nasty bug, I hope we can find the cause and fix it.

It kinda sounds like a use-after-free bug where the cmd pointer is accessed after it has been freed somewhere else. However, it is freed literally in the next line, and not somewhere else ...

Smells a bit like undefined behavior ...

These bad_accesses happen somewhere in APF?

Well, yes, the CommandQueue is used to send messages from the control thread to the audio thread (and back).
It might be a problem in the APF, but not necessarily.

Any other logs that would help?

I don't know. It seems the problem happens when calling the cleanup() function, but before this function is actually executed.

I'm on a M1 Mac.

That's a good hint. I have the feeling that our ring buffer implementation might not be correct on ARM processors.

Are you running the SSR natively or via Rosetta?

The first thing I would try is to use atomics in our ring buffer and see if that changes anything.
Currently, I don't have a lot of time, but maybe I can try a few things next week.

@janhenhan
Copy link
Author

Thanks Matthias! It would be really great if you can find the time to have a look at some point :)

Are you running the SSR natively or via Rosetta?
I'm running a native M1 arm build.

For what it is worth, a maybe questionable observation I have made is that SSR seems to crash much quicker when I start it as a subprocess in python compared to when I wait for it to crash in the debugger... But that may just be subjective or within the range of the very varying times it runs until it crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants