Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate: Child thread as 'native' code #136

Open
Julusian opened this issue Jan 21, 2023 · 2 comments
Open

Investigate: Child thread as 'native' code #136

Julusian opened this issue Jan 21, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@Julusian
Copy link
Member

  • I'm submitting a ...
    [ ] bug report
    [x] feature request
    [ ] question about the decisions made in the repository
    [ ] question about how to use this project

  • Summary

There is a child thread being used (worker thread via threadedclass), as the atem protocol is sensitive to timeouts, and needs to respond to messages every few milliseconds to ensure that the connection doesnt get dropped. We previously found it was easy to trigger drops doing simple things such as reading files, or some complex computations.

There are some bugs that have been open a long time for issues with how the child thread is implemented, I think it is worth re-evaluating how this child-thread is implemented and seeing if there is a better way of achieving it.
#125 #106 #74 and more

The code in question is a single 400 line file https://github.com/nrkno/sofie-atem-connection/blob/master/src/lib/atemSocketChild.ts, and it doesn't get many changes as it only handles the 'transport level' of the protocol.

Ideas

  1. Rewrite the child in wasm.
    I'm not sure if it is possible to create a udp socket in wasm, so this may not be achievable currently.
    This needs some research into whether that is possible, or if it is in a roadmap (eg planned for node 20 means it could be worth holding out for)
    What does this mean for threading? Can we achieve what we need?
    The requirement for buffers to be defined in the memory space of wasm should not be an issue for us here, but we should investigate how to handle this in a safe and performant way.
    Harder to maintain, and could hit limitations in the wasm runtime.

  2. Rewrite the child as native code. (c++/rust)
    This could introduce issues for users if they need to build the native portion themselves.
    Harder to maintain.
    Easy to use threading
    Minimal overhead of type translation

  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. StackOverflow, personal fork, etc.)
@Julusian Julusian added the enhancement New feature or request label Jan 21, 2023
@ianshade
Copy link
Contributor

Rewrite the child in wasm.

Correct me if I'm wrong, but aren't threads in WASM also built on top of the platform features (Web Workers / Worker Threads)? Rewriting in WASM could maybe reduce some overhead of Node's event loop and all, but for a task as simple as listening for packets and replying, would there be any noticeable gain? I think what could help more is being in control of thread priority.

the atem protocol is sensitive to timeouts, and needs to respond to messages every few milliseconds to ensure that the connection doesnt get dropped

If that's a timeout on ATEM's side, did you consider asking Blackmagic Design to make it more forgiving?

@Julusian
Copy link
Member Author

Correct me if I'm wrong, but aren't threads in WASM also built on top of the platform features (Web Workers / Worker Threads)?

It does sound like it, but it also looks like that will be handled inside the WASM, so should avoid some of the issues of using worker threads. Making worker_threads work cleanly for webpack, electron or the browser is not simple (threadedclass solves some of these, but webpack requires marking the whole of atem-connection as an external)
https://web.dev/webassembly-threads/

If that's a timeout on ATEM's side, did you consider asking Blackmagic Design to make it more forgiving?

I have not, but I don't think that will be entirely negate the need to use a thread. The timeout for resends is currently set at 60ms. If threading was disabled in this library, for sofie that timeout is most likely fine (assuming threading is enabled inside there). But if threading is disabled inside playout-gateway, then chances are that random timeouts will be hit.
And I don't think there is a single number which would resolve this, it will depend on how many TSR devices there are, complexity of the timeline and performance of the machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants