SpawnDev.ILGPU v4.6.0

6 backends. 1,511 tests. Zero failures.

CUDA, OpenCL, CPU, WebGPU, WebGL, and Wasm — all passing. GPU compute in the browser is no longer experimental.

Highlights

Full Multi-Worker Wasm Barrier Dispatch

The Wasm backend now supports full navigator.hardwareConcurrency workers with group barriers and shared memory. A pure spin barrier using i32.atomic.load loops replaces the previous wait32/notify approach, working around a V8 atomics visibility gap that caused data races with 3+ workers.

RadixSort Verified at Scale

RadixSort passes across all data types and sizes up to 4M elements on every backend — including Wasm in the browser. Key fixes:

Histogram counter buffer sizing — fixed undersized counters that caused real out-of-bounds writes during grid-stride iteration
Grid-stride tail byte padding — extended linear-memory slack allocation to prevent OOB traps on packed buffers
Per-worker scratch isolation — eliminated intermittent sort corruption in non-barrier kernels

20+ Wasm Codegen Fixes

Deep correctness pass across the Wasm code generator:

Fiber yield-per-phase with dynamic block splitting
Atomic loads/stores for all shared memory access in barrier kernels (including float via i32/i64 reinterpret)
Struct load copy semantics to prevent aliasing
Unsigned comparison in MinUInt32/MinUInt64 reductions
Correct atomic RMW opcode table for interleaved sub-word variants
Local alloca addressing, shared memory deduplication, and IR address space aliasing guards

WebGPU Backend Fixes

WGSL loop break + bool PHI: correct merge value generation when breaking from loops with boolean phi nodes
WGSL continuation after if-else with break: prevent unreachable code generation

Test Results

Backend	Pass	Skip
CUDA	all	—
OpenCL	all	—
CPU	all	—
WebGPU	229	12
WebGL	139	115
Wasm	249	3
Total	1,511	162

WebGL skips are architectural (GLSL ES 3.0 lacks shared memory/barriers/atomics). Wasm skips are subgroup-dependent features not available in browser WebAssembly.

What This Means

This release proves that GPU-class parallel algorithms — radix sort, scan, reduce, atomics, shared memory, group barriers — run correctly in the browser across WebGPU, WebGL, and WebAssembly, alongside native CUDA, OpenCL, and CPU backends. Write your kernel once, run it everywhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SpawnDev.ILGPU v4.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

SpawnDev.ILGPU v4.6.0

Highlights

Full Multi-Worker Wasm Barrier Dispatch

RadixSort Verified at Scale

20+ Wasm Codegen Fixes

WebGPU Backend Fixes

Test Results

What This Means

Uh oh!