Skip to content

Releases: LostBeard/SpawnDev.ILGPU

SpawnDev.ILGPU v4.6.0

23 Mar 01:49

Choose a tag to compare

SpawnDev.ILGPU v4.6.0

6 backends. 1,511 tests. Zero failures.

CUDA, OpenCL, CPU, WebGPU, WebGL, and Wasm — all passing. GPU compute in the browser is no longer experimental.

Highlights

Full Multi-Worker Wasm Barrier Dispatch

The Wasm backend now supports full navigator.hardwareConcurrency workers with group barriers and shared memory. A pure spin barrier using i32.atomic.load loops replaces the previous wait32/notify approach, working around a V8 atomics visibility gap that caused data races with 3+ workers.

RadixSort Verified at Scale

RadixSort passes across all data types and sizes up to 4M elements on every backend — including Wasm in the browser. Key fixes:

  • Histogram counter buffer sizing — fixed undersized counters that caused real out-of-bounds writes during grid-stride iteration
  • Grid-stride tail byte padding — extended linear-memory slack allocation to prevent OOB traps on packed buffers
  • Per-worker scratch isolation — eliminated intermittent sort corruption in non-barrier kernels

20+ Wasm Codegen Fixes

Deep correctness pass across the Wasm code generator:

  • Fiber yield-per-phase with dynamic block splitting
  • Atomic loads/stores for all shared memory access in barrier kernels (including float via i32/i64 reinterpret)
  • Struct load copy semantics to prevent aliasing
  • Unsigned comparison in MinUInt32/MinUInt64 reductions
  • Correct atomic RMW opcode table for interleaved sub-word variants
  • Local alloca addressing, shared memory deduplication, and IR address space aliasing guards

WebGPU Backend Fixes

  • WGSL loop break + bool PHI: correct merge value generation when breaking from loops with boolean phi nodes
  • WGSL continuation after if-else with break: prevent unreachable code generation

Test Results

Backend Pass Fail Skip
CUDA all 0
OpenCL all 0
CPU all 0
WebGPU 229 0 12
WebGL 139 0 115
Wasm 249 0 3
Total 1,511 0 162

WebGL skips are architectural (GLSL ES 3.0 lacks shared memory/barriers/atomics). Wasm skips are subgroup-dependent features not available in browser WebAssembly.

What This Means

This release proves that GPU-class parallel algorithms — radix sort, scan, reduce, atomics, shared memory, group barriers — run correctly in the browser across WebGPU, WebGL, and WebAssembly, alongside native CUDA, OpenCL, and CPU backends. Write your kernel once, run it everywhere.

SpawnDev.ILGPU v4.0.0

15 Mar 22:23

Choose a tag to compare

SpawnDev.ILGPU v4.0.0

Run ILGPU C# kernels on WebGPU, WebGL, Wasm, CUDA, OpenCL, and CPU — from a single codebase.

This is a major release with deep improvements to the WebGPU and Wasm backends, bringing ILGPU's algorithm library (RadixSort, Scan, Reduce) to the browser for the first time.

Highlights

WebGPU RadixSort — Full Algorithm Support

All RadixSort variants now pass on WebGPU, including large-scale sorts (4M+ elements), pairs, descending, and multiple data types. Fixed shared memory sizing, scan barrier synchronization, range checks for auto-grouped kernels, and 256-byte alignment padding for minStorageBufferOffsetAlignment.

Wasm Backend — Barrier Kernel Infrastructure

The Wasm backend received 7 codegen and dispatch fixes enabling correct barrier-synchronized kernels (Scan, Reduce, and single-group RadixSort):

  • Struct-with-view serialization — Fixed CLR-to-IR layout mismatch for kernel structs containing ArrayViews (e.g., InitializerImplementation<T>). Manual IR-layout-aware serialization replaces Unsafe.Write.
  • View field mapping — Fixed GetField handler returning 0 for ArrayView1D's Extent (Length) field, which caused all view.Length checks to fail silently.
  • Local alloca addressing — Fixed local memory allocations defaulting to address 0, which caused the ExclusiveScan helper to corrupt the data buffer between sort passes.
  • Per-thread scratch memory — Each parallel Web Worker now gets its own scratch region, preventing cross-worker data races during struct construction.
  • Post-helper barriers — Added synchronization barriers after each ExclusiveScan helper call to prevent fast workers from starting the next scan while slow workers are still completing the previous one.
  • SpecializedValue unwrapping — Fixed dispatch to correctly extract scalar values from SpecializedValue<T> wrapper structs.
  • GetViewLength tracing — Added TraceToParameter() to resolve view sources through GetField/NewView chains.

WebGPU Backend Refactor

Major internal restructuring for maintainability and performance:

  • Extracted SharedMemoryResolver and UniformityAnalyzer into standalone subsystems
  • Per-function emulation library trimming via BFS dependency graph
  • Dead variable elimination post-pass for cleaner generated WGSL
  • i64 constant hoisting to module-scope const declarations
  • Pre-compiled regex patterns replacing runtime Regex.IsMatch calls
  • WGSL pre-validation (ValidateWGSL()) catches shader errors before GPU submission
  • KernelSpecialization for all algorithm kernel loaders (RadixSort, Histogram, Scan, etc.)

Device Loss Detection

  • WebGPU: Monitors device.lost promise. IsDeviceLost property and DeviceLost event.
  • WebGL: Monitors webglcontextlost event via glWorker.js. IsContextLost property and ContextLost event.
  • Intentional disposal (Dispose()) is filtered out — only unexpected losses fire the events.

Test Infrastructure

  • PlaywrightMultiTest: Unified NUnit + Playwright runner executes all tests (desktop + browser) in a single dotnet test invocation
  • 1316 tests passing across all 6 backends (WebGPU, WebGL, Wasm, CUDA, OpenCL, CPU), 0 failures

Browser Backend Capabilities

WebGPU WebGL Wasm
Shared Memory
Group.Barrier()
Atomics
ILGPU Algorithms ✅ RadixSort, Scan, Reduce, Histogram ✅ Scan, Reduce (single-group)
64-bit (f64/i64) ✅ Emulated ✅ Emulated ✅ Native

Known Limitations

  • Wasm multi-group barrier dispatch: Barrier kernels are fully correct for single-group workloads (up to 64 elements for groupSize=64). Multi-group workloads have a cross-group SharedArrayBuffer memory visibility limitation in current browsers. A cooperative scheduling fix is planned for a future release. Desktop backends and WebGPU have no such limitation.

Breaking Changes

None. Existing ILGPU kernels and API usage are fully compatible.

Installation

dotnet add package SpawnDev.ILGPU --version 4.0.0

Links

  • Live Demo — Fractal Explorer, 3D Raymarching, GPU Boids, Benchmarks, Unit Tests
  • Documentation — Getting Started, Backends, Kernels, Memory & Buffers, Canvas Rendering
  • GitHub

SpawnDev.ILGPU v3.5.0

06 Mar 16:14

Choose a tag to compare

SpawnDev.ILGPU 3.5.0

Half (f16) Support

  • WebGPU f16 kernelsFloat16 maps to native f16 in WGSL. Buffer alignment, constant emission, and Half ↔ float conversion intrinsics all wired up. Capability-gated on device feature support.
  • XMath.Min/Max/Clamp for Half — Added to XMath via float promotion.
  • Group Scan/Reduce for HalfExclusiveScan, InclusiveScan, AllReduce, and GroupReduce now support Half on WebGPU and CUDA.
  • CUDA PTX Half warp shufflesWarpShuffle, WarpShuffleDown, WarpShuffleUp, WarpShuffleXor (and SubWarp variants) for Half via b32 widening. Unlocks Half scan/reduce on CUDA.
  • Lock-free AllReduce — Rewrote AllReduce in both IL and PTX backends to use per-warp shared-memory slots instead of atomic operations. Removes the Half atomics dependency entirely and is correct for all types.
  • Half.One constant fix — Was 0x0001 (denormal ≈5.96e-8); corrected to 0x3C00 (IEEE-754 1.0).

WebGPU RadixSort with double / long Keys

  • RadixSortPairs<double, …> and RadixSortPairs<long, …> now work on WebGPU. Multiple root causes fixed end-to-end:
    • FloatAsInt/IntAsFloat casts for emulated f64 now correctly reconstruct the IEEE-754 64-bit pattern.
    • Structs containing emulated 64-bit fields are flattened to array<u32> in WGSL ("packed structs") to match CPU memory layout.
    • True element count is passed to the GPU via a dedicated _scalar_params slot, replacing the incorrect arrayLength() calculation for packed views.
    • Sub-view element offset is now computed in u32 units (padding / 4) instead of logical CPU elements, fixing sort correctness for array sizes where the inner temp allocation doesn't start at a 256-byte boundary.

Canvas Rendering (ICanvasRenderer)

  • ICanvasRenderer API — New interface for presenting ILGPU pixel buffers (MemoryBuffer2D<uint/int>, packed RGBA) directly to an HTML <canvas> element. Obtained via CanvasRendererFactory.Create(accelerator).
  • WebGPU — Zero-copy path: a cached WGSL fullscreen-triangle pipeline reads the pixel buffer directly from a read-only-storage binding. No CPU readback. Blit to the visible canvas via drawImage. Pipeline and bind-group are built once; uniforms only re-uploaded on resolution change.
  • WebGL — Delegates to an offscreen FBO blit in the GL Web Worker. Result is transferred as ImageBitmap back to the main thread, preventing Blazor's render cycle from clearing the canvas between frames.
  • CPU / Wasm — Fallback via putImageData. Browser-backed buffers use CopyToHostUint8ArrayAsync for a JS-side copy; pure CPU buffers fall back to synchronous CopyToCPU.

WebGPU Warp Reduce without Subgroups

  • GenerateWarpReduce now emits a full shared-memory butterfly reduction when the subgroups feature is unavailable, replacing the previous no-op passthrough. Correct results on hardware/drivers that don't expose subgroup extensions.

Algorithm Type Coverage

Added scan and reduce test/support variants for double, long, and uint:

Operation New Types
ExclusiveScan double, uint
InclusiveScan long, double, uint
AllReduce double, long, uint
GroupReduce float, long, double, uint, Half

SpawnDev.ILGPU v3.3.0

22 Feb 06:41

Choose a tag to compare

SpawnDev.ILGPU v3.3.0 Release Notes

Desktop & Browser

  • WPF Demo Application — new desktop demo running the same shared kernels (Fractal Explorer, 3D Raymarching, GPU Boids) on CUDA, OpenCL, and CPU with live backend switching
  • Shared Kernel Library — extracted SpawnDev.ILGPU.Demo.Shared so browser and desktop demos share identical kernel code
  • Console Test Runner — added SpawnDev.ILGPU.ConsoleDemo for running the full unit test suite on desktop backends with process isolation for crash resilience
  • OpenCL 3.0 Compatibility — relaxed the GenericAddressSpace requirement, enabling NVIDIA GPUs with OpenCL 3.0 drivers that were previously blocked
  • Multi-platform support — updated SupportedPlatform to include Windows, Linux, and macOS

WebGL2 Backend — GPU-Resident Buffers

The WebGL2 backend has been refactored to eliminate unnecessary CPU↔GPU data transfers:

  • GPU-resident buffers — buffers persist as textures in the GL worker; kernel dispatch sends buffer references, not data
  • On-demand readbackCopyToHostAsync() is the only GPU→CPU transfer path
  • New worker protocolallocBuffer, uploadBuffer, readbackBuffer, freeBuffer messages manage buffer lifecycle
  • Proper buffer disposal — buffers are freed in the worker when disposed on the C# side

Wasm Backend Improvements

  • Expanded API coverage including shared memory, barriers, dynamic shared memory, atomics, and broadcasting
  • Single-worker fallback mode when SharedArrayBuffer is unavailable

Transpiler Fixes

  • Break-PHI bug — fixed assignments before break in loops being dropped in WGSL and GLSL transpilers
  • CopySign — corrected argument swap in the CopySign intrinsic
  • 64-bit reduce — fixed signed/unsigned mismatch in MinUInt64 and emu_f64 buffer I/O for AddDouble/MaxDouble
  • WebGL raymarching — fixed GLSL rendering issues
  • BVH ray traversal — corrected WebGPU and WebGL backend issues for complex scene traversal

Upstream ILGPU Fixes

Six bugs from the original ILGPU repo have been fixed in our fork:

Issue Description Severity
#1361 MathF.CopySign argument order swapped — silent wrong results on all GPU backends High
#1309 uint to float cast routed through double — crashes on devices without fp64 Medium
#1479 Infinite compilation with large local arrays (new int[1_000_000]) — 10+ min, 10+ GB RAM High
#1538 Internal Compiler Error with nested struct properties — wrong field slicing after type unification Medium
#1539 OpenCL produces wrong results for complex kernels — stale phi variables persisted across blocks High
#1540 H100/H200 not working — added SM_90, SM_100, SM_101, SM_120 architecture support High

See upstream-issues.md for detailed root cause analysis and fix descriptions.

Documentation

  • Corrected synchronization semantics: Synchronize() = flush (non-blocking), SynchronizeAsync() = flush + wait, CopyToHostAsync() = only GPU→CPU path
  • Updated test count to 640 tests across 8 suites
  • Added WebGL GPU-resident buffer architecture documentation
  • Reduced default logging verbosity across all backends

Demo Improvements

  • Game of Life — fixed mouse interaction and added NavMenu icon
  • Fractal Explorer — moved to shared kernel library, improved WebGL2 rendering pipeline
  • Reduced console log noise for cleaner browser dev tools experience

Full Changelog: v3.2.0...v3.3.0

SpawnDev.ILGPU v3.2.0

21 Feb 14:14

Choose a tag to compare

SpawnDev.ILGPU v3.2.0

Cross-platform GPU compute from a single codebase — browser and desktop.

What's New

🖥️ Desktop Support Verified

  • SpawnDev.ILGPU now officially supports desktop/server environments (Console, WPF, ASP.NET) alongside Blazor WebAssembly
  • Same NuGet package provides browser backends (WebGPU, WebGL, Wasm) and native backends (Cuda, OpenCL, CPU)
  • SynchronizeAsync() and CopyToHostAsync() work everywhere — async in the browser, graceful sync fallback on desktop
  • New SpawnDev.ILGPU.ConsoleDemo project included as a working reference

🎮 New Demos

  • Game of Life — GPU-accelerated cellular automaton
  • Boids 3D — Flocking simulation on all backends
  • Compute 3D — 3D compute shader demo

🐛 Bug Fixes

  • Fixed 3 transpiler bugs found during Game of Life development
  • Fixed handling of Debug IL in WebGPU and WebGL transpilers
  • Updated Wasm backend intrinsics

📚 Comprehensive Documentation

  • New Docs/ folder with 8 markdown guides: Getting Started, Backends, Kernels, Memory & Buffers, Advanced Patterns (GPU intrinsics, device sharing, rendering), Limitations, and API Reference
  • Covers both Blazor WASM and desktop usage
  • Incorporates foundational ILGPU concepts adapted for the browser

Full Changelog

See README.md and Docs/ for complete documentation.

SpawnDev.ILGPU v3.0.0

16 Feb 17:39

Choose a tag to compare

SpawnDev.ILGPU v3.0.0

What's New

🚀 Next-Generation GPU Computing in Blazor Wasm — v3.0.0 brings major performance improvements, streamlined architecture, and enhanced compatibility. Run C# ILGPU kernels on WebGPU, WebGL, and native WebAssembly with automatic backend selection.

Key Features

  • Three Powerful Backends — WebGPU (modern GPU compute via WGSL), WebGL (universal GPU access via GLSL ES 3.0), and Wasm (native WebAssembly on Web Workers)
  • CPU Backend — Standard ILGPU CPU accelerator included for debugging and performance comparison
  • Universal GPU Access — WebGPU for cutting-edge browsers, WebGL for virtually every device
  • Intelligent Auto-SelectionCreatePreferredAcceleratorAsync() automatically picks the best available backend (WebGPU → WebGL → Wasm)
  • 64-bit Computing — Full double and long support via optimized emulation on both GPU backends
  • Multi-Worker Dispatch — Wasm backend distributes work across all available CPU cores
  • Zero-Copy Shared Memory — SharedArrayBuffer support for efficient data sharing
  • Atomic Operations — Workgroup synchronization and atomic operations on WebGPU and Wasm backends
  • Production Ready — Comprehensive test suite, stable APIs, and real-world optimization

Built For

  • Blazor WebAssembly — Run compute-intensive C# kernels in the browser
  • 🎮 Game Development — GPU-accelerated physics, graphics, and AI
  • 📊 Data Processing — High-performance number crunching without native compilation
  • 🔬 Scientific Computing — GPGPU capabilities in pure managed code

Resources

Full Changelog: v2.1.0...v3.0.0

SpawnDev.ILGPU v2.1.0

13 Feb 20:41

Choose a tag to compare

SpawnDev.ILGPU v2.1.0

What's New

🖼️ New WebGL Backend — GPU-accelerated compute on virtually every modern browser and device. C# kernels are transpiled to GLSL ES 3.0 vertex shaders and executed via Transform Feedback, providing broad GPU access even where WebGPU isn't supported.

Highlights

  • Five backends — WebGPU, WebGL, Wasm, Workers, and CPU
  • Two GPU backends — WebGPU for cutting-edge browsers, WebGL for universal coverage
  • Auto-selectionCreatePreferredAcceleratorAsync() picks the best available backend (WebGPU → WebGL → Wasm → Workers → CPU)
  • 64-bit emulation on both GPU backends (double/long support via software emulation)
  • Benchmarks page — New interactive benchmark suite comparing throughput across all backends
  • Workers performance — Cached compiled functions and script bodies to reduce per-dispatch overhead

Links

Full Changelog: v2.0.0...v2.1.0

SpawnDev.ILGPU v2.0.0

09 Feb 23:23

Choose a tag to compare

SpawnDev.ILGPU v2.0.0 — First Stable Release

Run ILGPU kernels in the browser — on the GPU, across threads, or on the CPU.

SpawnDev.ILGPU v2.0.0 is the first stable release of this library, the successor to SpawnDev.ILGPU.WebGPU which only supported a single WebGPU backend. Version 2.0.0 brings four full compute backends, automatic device selection, and 360+ tests — all running entirely in the browser via Blazor WebAssembly.

What's New in 2.0.0

Four Compute Backends

Backend Executes on Performance
WebGPU GPU via WGSL transpilation ⚡⚡⚡ Fastest
Wasm Web Workers via native WebAssembly binary ⚡⚡ Fast
Workers Web Workers via JavaScript transpilation ⚡ Moderate
CPU Main thread via .NET runtime 🐢 Fallback

Automatic Backend Selection

Call CreatePreferredAcceleratorAsync() and the library picks the best available backend: WebGPU → Wasm → Workers → CPU.

Key Features

  • WGSL transpilation — C# ILGPU kernels compiled to WebGPU Shading Language for GPU execution
  • Wasm compilation — Kernels compiled to native WebAssembly binary modules for near-native performance
  • 64-bit emulation — Full double (f64) and long (i64) support via software emulation on WebGPU
  • WebGPU extension auto-detection — Probes adapter for shader-f16, subgroups, timestamp-query and enables them automatically
  • Subgroup operationsGroup.Broadcast and Warp.Shuffle supported when the browser exposes the subgroups extension
  • Multi-worker dispatch — Wasm and Workers backends distribute work across all available CPU cores
  • Shared memory & atomics — Workgroup memory, barriers, and atomic operations across backends
  • No native dependencies — Pure C#, powered by SpawnDev.BlazorJS

360+ Tests

Comprehensive coverage across all backends: memory, indexing, arithmetic, bitwise, math functions, atomics, control flow, structs, type casting, 64-bit emulation, GPU patterns, shared memory, broadcast & subgroups, and more.

Interactive Demo

Try the live demo featuring a real-time Fractal Explorer that lets you switch between all four backends and compare performance.

Installation

dotnet add package SpawnDev.ILGPU

Breaking Changes from SpawnDev.ILGPU.WebGPU

This package replaces SpawnDev.ILGPU.WebGPU. Key differences:

  • Namespace: SpawnDev.ILGPU (was SpawnDev.ILGPU.WebGPU)
  • Multiple backends: WebGPU is no longer the only option — Wasm, Workers, and CPU backends are included
  • Unified API: Context.CreateAsync() with builder pattern for all backends