Skip to content

Add socketioxide (Rust Socket.io) as a comparison target#1

Open
irinanazarova wants to merge 10 commits into
mainfrom
add-socketioxide
Open

Add socketioxide (Rust Socket.io) as a comparison target#1
irinanazarova wants to merge 10 commits into
mainfrom
add-socketioxide

Conversation

@irinanazarova

Copy link
Copy Markdown
Collaborator

socketioxide's author asked on X if we'd benchmark their Rust Socket.io server alongside the existing targets. This adds it and reports a full head-to-head with AnyCable on the same Railway hardware.

What's here

  • socketioxide/ — a Rust bench server (Axum + socketioxide 0.18.4) that speaks the Socket.io wire protocol, so the existing socket.io-client bench-runner drives it unchanged. Dockerfile, Cargo.lock, Railway config included.
  • Manifest rows in tests-manifest.ts for socketioxide latency / jitter / idle / avalanche.
  • README section with the result tables, plus a deep dive in docs/socketioxide-comparison.md.
  • Raw result artifacts under backend/results/.
  • Security: npm audit fix (undici 8.2.0 → 8.5.0).

No new bench-runner endpoints: socketioxide reuses the Socket.io ones via ?serverUrl=.

Results (Railway, head-to-head with AnyCable)

Dimension socketioxide AnyCable
Latency 10K p50/p99 289 / 972 ms 232 / 731 ms (comparable)
Jitter delivery 10K 41% then 33% 100%
Avalanche 5K → 10K → 20K 100% → 96% → 0% 0 s by design
Idle held 600K+, ~37 KB/conn, 5x past Node's ~120K ceiling ~39 KB/conn

The takeaway: Rust fixes Socket.io's capacity ceiling (the single-event-loop wall), but not its at-most-once delivery or in-process deploy fragility. Those are protocol and topology properties, so socketioxide collapses under jitter and deploy storms at scale the same way Node Socket.io does.

Notes

  • Three deploy fixes the local build didn't catch: Rust 1.94+ base image (socketioxide MSRV), IPv6 [::] bind (Railway private net is IPv6), pinned PORT=3000.
  • Idle was harness-limited at ~600K (ephemeral-port cap of ~12K connections per shard); pushing to 1M needs a larger / lighter load-gen fleet. socketioxide itself never saturated.
  • CSR is deferred: socketioxide 0.18.4 does not appear to ship Connection State Recovery (open question to the author in the doc).
  • The comparison page (anycable.io) was intentionally left untouched.

Library author asked on X if we'd benchmark their crate alongside
Node Socket.io, uWS, and AnyCable. socketioxide speaks the Socket.io
wire protocol, so the bench-runner's existing socket.io-client driver
works against it unchanged. The work is server-side.

New `socketioxide/` directory carries the Rust Cargo project, the
Dockerfile, and a railway.toml. The server mirrors the shape of
backend/src/socketio/server.ts: /health, /stats, /_broadcast,
/publish-local, plus a Connection State Recovery toggle via
SOCKETIO_CSR=1 (FIXME'd in main.rs because the crate's CSR API has
shifted across versions).

Manifest entries land under each rubric (latency 1K/10K, jitter,
idle, avalanche escalation), targeting the existing
bench-jitter-socketio, bench-idle-socketio, and bench-avalanche-socketio
endpoints via ?serverUrl=. Baselines are empty until first run.

docs/socketioxide-comparison.md tracks status, open questions for
the library author, and the eventual results. README links to it
under 'Additional target on request'.

Build + tests green. Rust code is unverified by compile from this
end; the GitHub issue will tag the library author for review.
Seven advisories rolled into one high-severity npm-audit row in undici
8.0.0-8.4.1 (cert validation bypass in SOCKS5 ProxyAgent, header
injection via Set-Cookie, WS DoS via fragment-count and cumulative-
fragment bypasses, HTTP response queue poisoning, Set-Cookie SameSite
attribute downgrade, cross-user info disclosure via shared cache
whitespace bypass).

undici is driver-side (used by the long-timeout Agent in bench
scripts), not bench-runner-side. Fix re-resolves within the existing
^8.2.0 caret range; no package.json change.

  found 0 vulnerabilities
Maintainer + API check turned up two scaffold bugs:

1. Wrong version pin. Cargo.toml had socketioxide = "0.16"; latest
   stable is 0.18.3 (Apr 2026, same author actively shipping engineio
   hardening fixes on the day this branch landed). Bumped to ^0.18,
   features ['v4', 'tracing'] for the Socket.io v4 wire protocol and
   structured logs.

2. The 'state-recovery' feature I specified does not exist. socketioxide
   0.18.3 feature list is v4 / msgpack / tracing / extensions / state /
   __test_harness. The 'state' feature is for application-shared state
   (with_state), not session resume. CSR is not documented in the
   README, examples, or feature flags.

Dropped:
- socketioxideCsr TARGETS entry
- All -csr manifest entries (latency-csr, jitter-csr)
- The two FIXME blocks in main.rs that pretended to enable CSR
- The recovered counter and /stats-csr endpoint
- The SOCKETIO_CSR env var

What's left is honest at-most-once socketioxide: tested with the same
disruption shape as default Socket.io and uWS. The architectural
prediction is that it lands in the at-most-once band (~85% delivery
under jitter, in-process WS dies with the app on deploy), which would
confirm the page's claim that those properties are about deployment
topology rather than runtime language.

CSR is now an open question to the library author in
docs/socketioxide-comparison.md (does the crate ship CSR? Is it on
the roadmap?). If yes, we add the variant back. If no, the comparison
is what it is.
…yCable

The scaffold now compiles and runs. Fixing it against the released crate
turned up three API mismatches from my first guess:

- Connect handlers must be async (io.ns('/', on_connect) where
  on_connect is an async fn), not sync closures.
- State (the connection counter) goes through .with_state(Arc<AtomicU64>)
  + the State extractor, gated behind the 'state' feature flag, which
  I'd left out. Handlers read it as IoState<ConnCounter>.
- emit takes &data and is .await-ed; room handlers use Data<T>
  extractors, not positional Value.

Pinned 0.18 resolves to 0.18.4; release build is clean.

First real numbers, local head-to-head with anycable-go 1.6.14 as a
same-window control (its normal shape confirms the environment was
sound). 200 clients, per-message HTTP publish for both:

  Latency (jitter off): socketioxide 100% / p99 18ms,
                        AnyCable 100% / p99 22ms. Comparable.
  Jitter (TCP drop /15s): socketioxide 91.6% delivered (at-most-once,
                        no replay, fast on what it sends),
                        AnyCable 100% (replay, multi-second tail).

socketioxide lands in the at-most-once band with default Socket.io and
uWS: the delivery gap is the protocol (replay vs none), not the runtime
language. Confirms the page's architectural claim across a fourth impl.

Results table + reproducer in docs/socketioxide-comparison.md; raw JSON
force-added at backend/results/socketioxide-local-2026-06-23.json.
Comparison page untouched, as requested. Railway-scale rows (10K, idle
1M, avalanche) still pending a deploy; FILTER=socketioxide,anycable
runs the new rows with AnyCable as the canary.
Three fixes found deploying to Railway:
- Base image rust:1.83 was below socketioxide 0.18.4's MSRV (1.94) and
  too old for a transitive dep needing edition2024. Use rust:1-slim.
- Commit Cargo.lock + build --locked so the image uses the exact deps
  resolved and tested locally (dropped the strip step; binutils absent
  in slim).
- Bind [::] (IPv6 dual-stack), not 0.0.0.0. Railway's private network
  (*.railway.internal) routes over IPv6; 0.0.0.0 is unreachable
  internally. Verified: 20/20 clients connect via the bench-runner,
  100% delivery, 42ms p99. PORT is pinned to 3000 on the service to
  match the manifest target.
Phase 1 on real infra. Deployed socketioxide-server to the bench
project, woke anycable-go OSS as the same-window canary, drove both from
the Railway bench-runner over the internal network.

Latency (jitter off): comparable to AnyCable at 1K and 10K, both 100%
delivery (socketioxide 289/972ms p50/p99 at 10K vs AnyCable 232/731ms).

Jitter delivery, bracketed across scale:
  200 local: 91.6%   1K Railway: 89.4%   10K Railway: 40.6% then 32.7%

socketioxide is at-most-once: it sits in the band with default Socket.io
(~85%) and uWS (~87%) up to 1K, then collapses under the 10K reconnect
storm. Two independent 10K runs (41%, 33%) confirm it; not a crash (0
connect failures, 10K/10K connect), not Railway noise (AnyCable held
100% in the same windows). The Rust runtime does not rescue the
in-process at-most-once architecture at scale; AnyCable holds 100%
because the WS layer is a separate process that the deploy/storm never
restarts and replay recovers the offline-window gap.

Comparison page untouched. Idle 1M + avalanche deferred to phase 2
(needs the 50-shard fleet). All phase-1 services torn back down to
offline after the run.

Report: docs/socketioxide-comparison.md. Raw: backend/results/.
Idle test was harness-limited at ~600K (both targets hit the identical
~600,090 ceiling because the bench-runner shards capped ~12K clients
each, not because either server saturated; both sized 32GB peaked at
~21-22GB).

At 600K held: socketioxide ~37 KB/conn (1.8% CPU), anycable-go ~39
KB/conn (9% CPU). Comparable per-connection memory, both well under Node
Socket.io's ~52 KB.

The notable finding: socketioxide held 600K+, ~5x past Node Socket.io's
~120K single-event-loop ceiling. tokio's multi-threading clears the wall
that caps Node. So Rust fixes Socket.io's capacity limit (runtime
concurrency) but not its at-most-once delivery limit (protocol). Set the
idle-socketioxide targetServiceId in the manifest for metrics. Avalanche
rows still running.
Avalanche escalation on Railway: 5K recovers 100% in 2.9s, 10K is 96%
in 67s (411 never back), 20K collapses to 0% recovered. Tracks Node
Socket.io's cliff almost exactly (Socket.io 10K ~65s/96%). The in-process
WS layer dies with the app deploy regardless of runtime language; the
reconnect storm overwhelms the restarted single instance at scale.
AnyCable is 0s by construction (separate process). Raw under
backend/results/railway-phase2/.
Traced the 600K idle cap to a hard ~12,002 connections/shard
(ephemeral-port exhaustion to one host:port from one source IP; not
memory, containers report nofile=122880). Grew the fleet 50 -> 85
shards (~1.02M theoretical) and re-ran: 49 shards delivered a clean
12,000 each (588,000, 0 failures), 36 errored/timed out under the
coordinator fan-out. Harness-limited, not server-limited; socketioxide
accepted every connection the surviving shards threw with memory
headroom.

Established: socketioxide holds at least ~600K idle socket.io
connections on a 32GB box, ~5x past Node Socket.io's ~120K event-loop
ceiling, RAM/conn comparable to AnyCable. True ceiling unmeasured (needs
more source IPs / a lighter idle client than socket.io-client).

Phase-2 teardown: all 87 services stopped (85 shards + anycable-go +
socketioxide-server), both 32GB targets downsized to 0.5GB/1vCPU,
verified offline. Comparison page untouched throughout.
Promote the socketioxide head-to-head from a one-line pointer to a full
results section in the README: latency, jitter, avalanche, and idle
tables vs AnyCable, with the takeaway that Rust fixes Socket.io's
capacity ceiling but not its at-most-once delivery or in-process deploy
fragility. Deep dive stays in docs/socketioxide-comparison.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant