`FfiRoom::connect` blocks ConnectCallback on audio-filter `on_load()` with no timeout — network stall turns into a multi-minute connect hang

**Description**

In `FfiRoom::connect`, after `Room::connect()` succeeds, registered audio-filter plugins are initialized via `filter.on_load(&req.url, &req.token)` in a `spawn_blocking` task, and the `ConnectCallback` is not sent to the FFI client until that completes:

https://github.com/livekit/rust-sdks/blob/f5e85edd394a08c6282ed0dbf8ef881272869c23/livekit-ffi/src/server/room.rs#L151-L172

`on_load` for the noise-cancellation plugin performs a blocking HTTPS request to the LiveKit Cloud endpoint. There is no timeout around the plugin call. If that request stalls (DNS resolver timeout, SYN blackhole, etc.), the `ConnectCallback` is delayed for however long the OS takes to give up — in practice ~130+ seconds (default Linux TCP connect give-up).

**Observed behavior**

In a production voice-agent deployment (Python agents framework over this FFI) we observed two calls where:

- The room WebSocket connected normally — the agent participant was active server-side within ~0.4s of job dispatch.
- The FFI did not deliver `ConnectCallback` for ~137s. The agents framework logged `The room connection was not established within 10 seconds after calling job_entry`, and `ctx.connect()` appeared hung.
- Because the agent session couldn't start, the agent never subscribed/published; the SIP caller heard ~128s of ringing and the carrier CANCELled the call.
- When `on_load` finally failed, the FFI logged `audio filter cannot be enabled: LiveKit Cloud is required` (older message text; now "ensure you are connecting to LiveKit Cloud...") and only then reported connected — ~10s after the room had already been torn down.

Note the irony: filter init failure is deliberately non-fatal ("Skip returning an error here to keep the rtc session alive"), but a *slow* failure is effectively fatal to the session anyway, and misleadingly presents as a room-connection problem rather than a plugin problem.

**Proposed fix**

- Wrap the `on_load` loop in a bounded timeout (a few seconds, e.g. 5s, or configurable via `ConnectRequest`). On timeout: log a warning, skip enabling the filter, and proceed — matching the existing non-fatal error handling.
- Alternatively/additionally: send `ConnectCallback` first and initialize filters concurrently, marking the filter unavailable if init fails.
- Consider logging the elapsed time on filter-init failure to make this failure mode diagnosable.

**Environment**

- `livekit-ffi` via `livekit-agents` (Python), agent worker on Linux x86_64
- Plugin: `livekit-plugins-noise-cancellation` (BVC)
- Reproduces whenever the cloud HTTPS endpoint is unreachable-but-blackholed during connect (resolvable by e.g. dropping outbound 443 to the edge after the WS is established — the WS connects via a fallback path, `on_load` then stalls on the primary hostname)


See thread show down stream effect, although not root cause [here](https://community.livekit.io/t/signal-connection-times-out-on-the-v0-path-at-agent-join-forcing-a-fallback-that-adds-0-5-5s-of-call-setup-latency/1377/15?u=cwilson)


	// initialize audio filters
	let result = server
	.async_runtime
	.spawn_blocking(move \|\| {
	for filter in registered_audio_filter_plugins().into_iter() {
	filter.on_load(&req.url, &req.token).map_err(\|e\| e.to_string())?;
	}
	Ok::<(), String>(())
	})
	.await
	.map_err(\|e\| e.to_string());
	match result {
	Err(e) \| Ok(Err(e)) => {
	log::warn!("error while initializing audio filter: {}", e);
	log::error!(
	"audio filter cannot be enabled: ensure you are connecting to LiveKit Cloud and that the filter is properly configured"
	);
	// Skip returning an error here to keep the rtc session alive
	// But in this case, the filter isn't enabled in the session.
	}
	Ok(Ok(_)) => (),
	};

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`FfiRoom::connect` blocks ConnectCallback on audio-filter `on_load()` with no timeout — network stall turns into a multi-minute connect hang #1163

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FfiRoom::connect blocks ConnectCallback on audio-filter on_load() with no timeout — network stall turns into a multi-minute connect hang #1163

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`FfiRoom::connect` blocks ConnectCallback on audio-filter `on_load()` with no timeout — network stall turns into a multi-minute connect hang #1163