Skip to content

fix: hang on rapid start/stop threads#323

Merged
iparaskev merged 2 commits into
mainfrom
fix_threads_hang
Jun 2, 2026
Merged

fix: hang on rapid start/stop threads#323
iparaskev merged 2 commits into
mainfrom
fix_threads_hang

Conversation

@iparaskev
Copy link
Copy Markdown
Contributor

@iparaskev iparaskev commented Jun 2, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved stability for screenshare functionality during rapid connection state changes.
  • Chores

    • Version bumped to 1.0.17.
    • Internal refactoring to enhance application responsiveness and resource management during shutdown.

iparaskev added 2 commits June 2, 2026 23:03
The screensharing window could hang the main event loop when it
was closed and then opened again. This could happen because the close
could happen while the redraw_thread was in request_redraw, which needs
the main event to be free in order to continue and at the same time a
new open would wait from the main event loop for the previous redraw
thread to join.

This change removes all the joins from the codebase as they don't really
needed for protecting us from zombie threads.
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 2, 2026

Deploy Preview for hoppdocs ready!

Name Link
🔨 Latest commit c8de755
🔍 Latest deploy log https://app.netlify.com/projects/hoppdocs/deploys/6a1f540507fb320008171ee2
😎 Deploy Preview https://deploy-preview-323--hoppdocs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors thread lifecycle management across the entire core application: removing explicit JoinHandle storage and replacing join-on-drop patterns with fire-and-forget threads signaling shutdown via channels. Changes span socket readers, audio/video capture, graphics redraw, input handling, and application orchestration, plus a new screenshare hang reproduction test.

Changes

Thread Lifecycle Refactoring

Layer / File(s) Summary
Socket and background reader/capturer threads
core/socket_lib/src/lib.rs, core/src/audio/capturer.rs, core/src/camera/stream_nokhwa.rs, core/src/capture/stream.rs, core/src/capture/capturer.rs
EventSocket, Capturer (audio), CameraStream, and Stream (capture) no longer store JoinHandles. Threads spawn directly without keeping handles; Drop shuts down via channel signals (shutdown stream, stop flag, or StopCapture messages) instead of joining. State tracking shifts to Option<Channel> instead of Option<JoinHandle>.
Graphics and input thread management
core/src/graphics/graphics_context.rs, core/src/input/mouse_macos.rs, core/src/input/mouse_windows.rs
GraphicsContext, MouseObserver (macOS/Windows), and CursorSimulator remove stored JoinHandle fields. Redraw and event-tap threads spawn without handles; Drop sends stop commands over channels without joining. CursorSimulator similarly spawns its input worker detached.
Window redraw thread API changes
core/src/window/camera_window.rs, core/src/window/drawing_window.rs, core/src/window/screensharing_window.rs
Methods take_redraw_thread() (returned handle for caller-side joining) are removed and replaced with stop_redraw_thread() (sends Stop command and detaches the stored handle internally). ScreensharingWindow::update_window_with_new_sharer() no longer returns a join handle; it calls stop_redraw_thread() before spawning a replacement.
Application lifecycle and window coordination
core/src/lib.rs
Application removes _screen_capturer_events join-handle field and spawns poll_stream directly. CallStart, StartCamera, OpenCamera, OpenScreenShareWindow, and LocalDrawingEnabled remove pre-creation calls that joined redraw threads; they now call close_*_window() helpers or update windows directly. Drop for Application stops capture/runtime without joining stored threads.
Screenshare hang reproduction test
core/tests/src/hang_repro.rs, core/tests/src/main.rs
New test_screenshare_reconnect_hang async test creates a viewer call, two fake participants (audio and screenshare video), and publishes a muted screenshare track. It unmutes to trigger viewer redraw behavior, then runs rapid mute/unmute cycles to reproduce potential hangs. Periodic event draining keeps channels clear; 30-second wait follows for deadlock detection. New HangRepro CLI subcommand in main.rs enables standalone execution.
Version update
tauri/src-tauri/tauri.conf.json
Tauri application version incremented from 1.0.16 to 1.0.17.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • gethopp/hopp#294: Both PRs directly modify core/src/window/camera_window.rs's redraw-thread lifecycle API; the retrieved PR introduces joinable redraw thread management via take_redraw_thread, while this PR replaces that pattern with stop_redraw_thread and fire-and-forget detached threads.

Suggested reviewers

  • konsalex

Poem

🐰 Threads once joined now fly free,
Fire-and-forget, detached glee,
Channels sing shutdown, no deadlock here,
Screenshare cycles cycling clear,
One version bump, the refactor's dear! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: refactoring thread lifecycle management across multiple components to eliminate thread joining and use fire-and-forget spawning, which fixes hangs caused by rapid start/stop cycles.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix_threads_hang

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/window/screensharing_window.rs (1)

804-840: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Rebuild participant state when reusing the screensharing window.

participants is only used to recompute sharer_name; participants_manager keeps the previous session's cursors, paths, and drawing modes. Reopening the same window for a new share can therefore render stale overlays until separate connect/disconnect events happen, which reconnect flows do not guarantee.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/window/screensharing_window.rs` around lines 804 - 840, The
participants slice is currently only used to set sharer_name, leaving
self.participants_manager from the prior session intact and causing stale
cursors/paths/drawing modes; inside update_window_with_new_sharer you should
reset/rebuild participant state in participants_manager when reusing the window:
call the manager's reset/clear method (or implement one) and then repopulate it
from the provided participants (creating fresh participant entries, cursors,
path lists, and drawing-mode defaults), while keeping the existing resets to
self.state (sharer_name, current_path_id, remote_control_allowed, etc.), and
ensuring any references used by the redraw thread are updated before
spawn_redraw_thread is called.
🧹 Nitpick comments (1)
core/src/input/mouse_windows.rs (1)

164-164: 💤 Low value

Pre-existing race on EVENT_CALLBACK is slightly widened by detached threads.

With the hook thread now detached, if MouseObserver is dropped and immediately recreated, there's a small window where the old thread's EVENT_CALLBACK = None (line 231) could clear the new thread's callback. The existing code comment (lines 29-33) already acknowledges this risk. Since the hang fix is the higher priority and the race window is narrow, this is acceptable for now but worth noting for future hardening with proper synchronization.

Also applies to: 230-232

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/input/mouse_windows.rs` at line 164, Detached hook thread can
outlive a recreated MouseObserver and let the old thread clear the global
EVENT_CALLBACK; to fix, stop detaching the thread spawned in the spawn closure
and instead store its JoinHandle on the MouseObserver struct (keep reference
created by std::thread::spawn) and join or signal and join that handle in
MouseObserver::drop so the old thread cannot run after EVENT_CALLBACK has been
replaced; alternatively, replace the global EVENT_CALLBACK with a synchronized
shared handle (e.g., Arc<Mutex<Option<...>>> or an atomic token) and ensure the
spawned closure checks that token before clearing the callback so the race is
eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/src/lib.rs`:
- Around line 515-519: The close_camera_window helper should hide the native
window before detaching its redraw worker: when you take the CameraWindow from
self.camera_window in close_camera_window, call its hide API (e.g.,
CameraWindow::hide or a method that hides its inner Window/sets visibility)
while you still own cam, then call cam.stop_redraw_thread(); if CameraWindow
lacks a hide method, add one that hides the inner Window/sets visibility so the
native window is not visible for an extra frame before the redraw thread exits.

In `@core/src/window/screensharing_window.rs`:
- Around line 2005-2010: The Stop message can be starved by queued ForceRedraws
on the same FIFO channel; change the shutdown path to use a dedicated,
low-latency signal instead of relying on redraw_tx. Add a new shutdown sender
(e.g. shutdown_tx) that the redraw worker also selects on, have
stop_redraw_thread send the shutdown signal over shutdown_tx (or close it) and
then take redraw_thread; update the worker loop to listen for shutdown_tx
alongside redraw_rx so Stop cannot be blocked by ForceRedraw messages. Ensure
you update the struct to hold shutdown_tx/shutdown_rx and modify the redraw loop
and stop_redraw_thread accordingly (references: stop_redraw_thread, redraw_tx,
RedrawCommand::Stop, redraw_thread, ForceRedraw).

In `@core/tests/src/hang_repro.rs`:
- Line 15: The loop's drain branch never executes because MUTE_UNMUTE_CYCLES is
5 while the condition checks if i % 10 == 0, so adjust either the constant or
the condition so the drain runs: update MUTE_UNMUTE_CYCLES to 10 or change the
check in the hot loop (the if i % 10 == 0 that is intended to drain video_rx) to
use a value derived from MUTE_UNMUTE_CYCLES (e.g., if i % (MUTE_UNMUTE_CYCLES *
2) == 0) so video_rx is actually drained during the loop in hang_repro.rs.
- Around line 102-116: The variable use_av1 is only defined for macOS and
Windows but later used unconditionally by max_bitrate and video_codec; make
use_av1 defined for all targets (e.g., let use_av1 = cfg!(target_os =
"windows"); or evaluate cfg!(target_os = "macos")/cfg!(target_os = "windows") as
needed) so that max_bitrate (H264_BITRATE_DEFAULT) and video_codec
(VideoCodec::AV1 / VideoCodec::H264) compile on non-macOS/non-Windows platforms;
update the declaration of use_av1 near where max_bitrate and video_codec are
computed so the identifier is always in scope.

---

Outside diff comments:
In `@core/src/window/screensharing_window.rs`:
- Around line 804-840: The participants slice is currently only used to set
sharer_name, leaving self.participants_manager from the prior session intact and
causing stale cursors/paths/drawing modes; inside update_window_with_new_sharer
you should reset/rebuild participant state in participants_manager when reusing
the window: call the manager's reset/clear method (or implement one) and then
repopulate it from the provided participants (creating fresh participant
entries, cursors, path lists, and drawing-mode defaults), while keeping the
existing resets to self.state (sharer_name, current_path_id,
remote_control_allowed, etc.), and ensuring any references used by the redraw
thread are updated before spawn_redraw_thread is called.

---

Nitpick comments:
In `@core/src/input/mouse_windows.rs`:
- Line 164: Detached hook thread can outlive a recreated MouseObserver and let
the old thread clear the global EVENT_CALLBACK; to fix, stop detaching the
thread spawned in the spawn closure and instead store its JoinHandle on the
MouseObserver struct (keep reference created by std::thread::spawn) and join or
signal and join that handle in MouseObserver::drop so the old thread cannot run
after EVENT_CALLBACK has been replaced; alternatively, replace the global
EVENT_CALLBACK with a synchronized shared handle (e.g., Arc<Mutex<Option<...>>>
or an atomic token) and ensure the spawned closure checks that token before
clearing the callback so the race is eliminated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 78c78195-fbd0-452c-8cdc-9c4ea3212ef4

📥 Commits

Reviewing files that changed from the base of the PR and between 414b2a5 and c8de755.

📒 Files selected for processing (15)
  • core/socket_lib/src/lib.rs
  • core/src/audio/capturer.rs
  • core/src/camera/stream_nokhwa.rs
  • core/src/capture/capturer.rs
  • core/src/capture/stream.rs
  • core/src/graphics/graphics_context.rs
  • core/src/input/mouse_macos.rs
  • core/src/input/mouse_windows.rs
  • core/src/lib.rs
  • core/src/window/camera_window.rs
  • core/src/window/drawing_window.rs
  • core/src/window/screensharing_window.rs
  • core/tests/src/hang_repro.rs
  • core/tests/src/main.rs
  • tauri/src-tauri/tauri.conf.json

Comment thread core/src/lib.rs
Comment thread core/src/window/screensharing_window.rs
Comment thread core/tests/src/hang_repro.rs
Comment thread core/tests/src/hang_repro.rs
@iparaskev iparaskev merged commit 2283d82 into main Jun 2, 2026
22 of 24 checks passed
@iparaskev iparaskev deleted the fix_threads_hang branch June 2, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant