Skip to content

SpawnDev.ILGPU v4.6.0

Latest

Choose a tag to compare

@LostBeard LostBeard released this 23 Mar 01:49
· 207 commits to master since this release

SpawnDev.ILGPU v4.6.0

6 backends. 1,511 tests. Zero failures.

CUDA, OpenCL, CPU, WebGPU, WebGL, and Wasm — all passing. GPU compute in the browser is no longer experimental.

Highlights

Full Multi-Worker Wasm Barrier Dispatch

The Wasm backend now supports full navigator.hardwareConcurrency workers with group barriers and shared memory. A pure spin barrier using i32.atomic.load loops replaces the previous wait32/notify approach, working around a V8 atomics visibility gap that caused data races with 3+ workers.

RadixSort Verified at Scale

RadixSort passes across all data types and sizes up to 4M elements on every backend — including Wasm in the browser. Key fixes:

  • Histogram counter buffer sizing — fixed undersized counters that caused real out-of-bounds writes during grid-stride iteration
  • Grid-stride tail byte padding — extended linear-memory slack allocation to prevent OOB traps on packed buffers
  • Per-worker scratch isolation — eliminated intermittent sort corruption in non-barrier kernels

20+ Wasm Codegen Fixes

Deep correctness pass across the Wasm code generator:

  • Fiber yield-per-phase with dynamic block splitting
  • Atomic loads/stores for all shared memory access in barrier kernels (including float via i32/i64 reinterpret)
  • Struct load copy semantics to prevent aliasing
  • Unsigned comparison in MinUInt32/MinUInt64 reductions
  • Correct atomic RMW opcode table for interleaved sub-word variants
  • Local alloca addressing, shared memory deduplication, and IR address space aliasing guards

WebGPU Backend Fixes

  • WGSL loop break + bool PHI: correct merge value generation when breaking from loops with boolean phi nodes
  • WGSL continuation after if-else with break: prevent unreachable code generation

Test Results

Backend Pass Fail Skip
CUDA all 0
OpenCL all 0
CPU all 0
WebGPU 229 0 12
WebGL 139 0 115
Wasm 249 0 3
Total 1,511 0 162

WebGL skips are architectural (GLSL ES 3.0 lacks shared memory/barriers/atomics). Wasm skips are subgroup-dependent features not available in browser WebAssembly.

What This Means

This release proves that GPU-class parallel algorithms — radix sort, scan, reduce, atomics, shared memory, group barriers — run correctly in the browser across WebGPU, WebGL, and WebAssembly, alongside native CUDA, OpenCL, and CPU backends. Write your kernel once, run it everywhere.