SpawnDev.ILGPU supports multiple backends for running ILGPU kernels. In the browser, three backends (WebGPU, WebGL, Wasm) bring GPU compute to Blazor WebAssembly. On desktop and server, ILGPU's native Cuda, OpenCL, and CPU backends are available. The same kernel code and async extensions work across all backends.
| 🎮 WebGPU | 🖼️ WebGL | 🧊 Wasm | |
|---|---|---|---|
| Executes on | GPU | GPU | Web Workers |
| Transpiles to | WGSL | GLSL ES 3.0 | WebAssembly binary |
| Technique | Compute shader | Transform Feedback | Multi-worker |
| Blocking | Non-blocking | Non-blocking | Non-blocking |
| Shared Memory | ✅ | ❌ | ✅ |
| Group.Barrier() | ✅ | ❌ | ✅ |
| Dynamic Shared Memory | ✅ | ❌ | ✅ |
| Atomics | ✅ | ❌ | ✅ |
| ILGPU Algorithms | ✅ RadixSort, Scan, Reduce, Histogram | ❌ | ✅ RadixSort, Scan, Reduce, Histogram |
| 64-bit (f64/i64) | ✅ Emulated | ✅ Emulated | ✅ Native |
| Browser support | Chrome/Edge 113+ | All modern browsers | All modern browsers |
| 🚀 Cuda | 🔧 OpenCL | 🐢 CPU | |
|---|---|---|---|
| Executes on | NVIDIA GPU | NVIDIA/AMD/Intel GPU | CPU cores |
| Transpiles to | PTX | OpenCL C | — (interpreted) |
| Shared Memory | ✅ | ✅ | ✅ |
| Atomics | ✅ | ✅ | ✅ |
| 64-bit | ✅ Native | ✅ Native | ✅ Native |
| Requirement | NVIDIA GPU + driver | OpenCL 2.0+ or 3.0 GPU | None |
Auto-selection priority (browser): WebGPU → WebGL → Wasm Auto-selection priority (desktop): Cuda → OpenCL → CPU
CUDA extras: The CUDA backend also provides access to NVIDIA-specific libraries: nvJPEG (image encode/decode), cuRand (random numbers), cuBLAS (linear algebra), cuFFT (FFT), and NVML (device monitoring). See CUDA Libraries.
The async pattern works on all platforms — both browser and desktop. This is the recommended approach for cross-platform code:
using global::ILGPU;
using global::ILGPU.Runtime;
using SpawnDev.ILGPU;
// Works in Blazor WASM, Console, WPF, ASP.NET — everywhere
using var context = await Context.CreateAsync(builder => builder.AllAcceleratorsAsync());
using var accelerator = await context.CreatePreferredAcceleratorAsync();
// ... load kernel, dispatch ...
await accelerator.SynchronizeAsync(); // Waits for GPU completion (no data transfer)
var results = await bufC.CopyToHostAsync<float>(); // The only GPU→CPU data transfer pathAllAcceleratorsAsync() automatically detects the environment:
- Browser: Registers WebGPU, WebGL, and Wasm
- Desktop: Registers Cuda, OpenCL, and CPU (browser backends are skipped)
CreatePreferredAcceleratorAsync() picks the best available backend on either platform.
Why async? Browser backends (Blazor WASM) require async — the single-threaded environment will deadlock on synchronous calls. Desktop backends support both sync and async, with async extensions gracefully falling back to synchronous ILGPU calls. Therefore, async is always recommended for maximum portability.
If you're certain your code will never run in a browser, you can use ILGPU's standard synchronous API:
// Desktop only — will deadlock in Blazor WASM
using var context = Context.Create(builder => builder.AllAccelerators());
using var accelerator = context.GetPreferredDevice(preferCPU: false)
.CreateAccelerator(context);
// ... load kernel, dispatch ...
accelerator.Synchronize(); // Blocking — safe on desktop, deadlocks in browser
var results = bufC.GetAsArray1D(); // Synchronous readbackThe fastest backend. Uses GPU compute shaders via the WebGPU API, transpiling kernels to WGSL.
using ILGPU;
using SpawnDev.ILGPU;
using SpawnDev.ILGPU.WebGPU;
using var context = await Context.CreateAsync(builder => builder.WebGPU());
var devices = context.GetWebGPUDevices();
if (devices.Count > 0)
{
using var accelerator = await devices[0].CreateAcceleratorAsync(context);
// Use accelerator...
}Use WebGPUBackendOptions to configure the transpiler:
using SpawnDev.ILGPU;
using SpawnDev.ILGPU.WebGPU.Backend;
// Default: Dekker f64 emulation (good precision, fast)
var options = new WebGPUBackendOptions();
// Ozaki f64 emulation (strict IEEE 754)
var options = new WebGPUBackendOptions { F64Emulation = F64EmulationMode.Ozaki };
// Disable f64 emulation (double → float, max performance)
var options = new WebGPUBackendOptions { F64Emulation = F64EmulationMode.Disabled };
using var accelerator = await devices[0].CreateAcceleratorAsync(context, options);- Compute shaders — full GPU compute via
@compute @workgroup_size - Shared memory —
SharedMemory.Allocate<T>()maps tovar<workgroup> - Barriers —
Group.Barrier()maps toworkgroupBarrier() - Atomics —
Atomic.Add,Atomic.Min,Atomic.Max,Atomic.CompareExchange - ILGPU Algorithms — RadixSort, Scan, Reduce, Histogram, and other algorithm extensions are fully supported and tested (including large-scale sorts up to 4M+ elements). Use
CreateRadixSortPairs<TKey, TValue>(),CreateScan(),CreateReduce(), etc. the same way as on desktop backends - Subgroups —
Group.Broadcast,Warp.Shuffle(when thesubgroupsextension is available) - Auto-detected extensions — probes adapter for
shader-f16,subgroups,timestamp-query, etc. - Device loss detection — monitors
device.lostpromise;IsDeviceLostproperty andDeviceLostevent fire on unexpected GPU device loss (driver crash, GPU reset, VRAM exhaustion). Subsequent dispatch/synchronize calls throwInvalidOperationExceptionwith a clear message
The WebGPU backend fully supports ILGPU.Algorithms, including RadixSort, Scan, Reduce, and Histogram. All algorithm tests pass in the browser test suite. Example:
using ILGPU;
using ILGPU.Algorithms;
using ILGPU.Algorithms.RadixSortOperations;
var radixSort = accelerator.CreateRadixSortPairs<float, Stride1D.Dense, int, Stride1D.Dense, AscendingFloat>();
var tempSize = accelerator.ComputeRadixSortPairsTempStorageSize<float, int, AscendingFloat>(keys.Length);
using var tempBuf = accelerator.Allocate1D<int>(tempSize);
radixSort(stream, keys.View, values.View, tempBuf.View);
await accelerator.SynchronizeAsync();Synchronous GPU→CPU copy methods (CopyTo, CopyToCPU, GetAsArray1D) throw NotSupportedException on WebGPU because GPU readback requires an async mapAsync call. Use CopyToHostAsync instead.
GPU→GPU copies via CopyFrom work perfectly — they use the native CopyBufferToBuffer command. See Memory & Buffers for the full compatibility table.
Chrome/Edge 113+, Firefox Nightly (with dom.webgpu.enabled).
Universal GPU backend that works in virtually every modern browser. Transpiles kernels to GLSL ES 3.0 vertex shaders and uses Transform Feedback for GPU compute.
using ILGPU;
using SpawnDev.ILGPU;
using SpawnDev.ILGPU.WebGL;
using var context = await Context.CreateAsync(builder => builder.WebGL());
var devices = context.GetWebGLDevices();
if (devices.Count > 0)
{
using var accelerator = devices[0].CreateAccelerator(context);
// Use accelerator...
}using SpawnDev.ILGPU;
using SpawnDev.ILGPU.WebGL.Backend;
// Default: Dekker f64 emulation
var options = new WebGLBackendOptions();
// Ozaki f64 emulation (strict IEEE 754)
var options = new WebGLBackendOptions { F64Emulation = F64EmulationMode.Ozaki };
using var accelerator = devices[0].CreateAccelerator(context, options);The WebGL backend is unique — all GL calls are dispatched to a dedicated Web Worker via glWorker.js. This keeps the main thread responsive even during intensive GPU compute.
Buffers persist as GPU-resident textures in the worker. Kernel dispatch sends buffer references (not data) — no ArrayBuffer transfers occur per dispatch. Data only moves to the CPU when explicitly requested via CopyToHostAsync().
- No shared memory — GLSL ES 3.0 vertex shaders don't support workgroup memory
- No atomics — not available in the vertex shader stage
- No barriers — no workgroup synchronization
- No sync GPU→CPU —
CopyTo/CopyToCPU/GetAsArray1DthrowNotSupportedException. UseCopyToHostAsync. GPU→GPUCopyFromworks.
All modern browsers — Chrome, Edge, Firefox, Safari, mobile browsers.
Compiles kernels to native WebAssembly binary modules and dispatches them across Web Workers for parallel CPU execution.
using ILGPU;
using SpawnDev.ILGPU;
using SpawnDev.ILGPU.Wasm;
using var context = await Context.CreateAsync(builder => builder.Wasm());
var devices = context.GetDevices<WasmILGPUDevice>();
if (devices.Count > 0)
{
using var accelerator = await devices[0].CreateAcceleratorAsync(context);
// Use accelerator...
}- Multi-worker dispatch — distributes work across all available CPU cores
- Native 64-bit —
doubleandlongwork natively (no emulation needed) - Shared memory — uses
SharedArrayBufferfor zero-copy data sharing - Group.Barrier() — full workgroup barrier synchronization across Web Workers
- Dynamic shared memory — runtime-sized workgroup memory via
SharedMemory.GetDynamic() - Group.Broadcast — intra-group value sharing
- Atomics — supported via
SharedArrayBuffer - ILGPU Algorithms — RadixSort, Scan, Reduce, and Histogram are fully supported with full
hardwareConcurrencymulti-worker barrier synchronization. The Wasm backend uses fiber-based phase dispatch with pure spin barriers, per-thread scratch memory, and an in-Wasm phase dispatcher that eliminates JS-Wasm boundary crossings between phases
- No sync GPU→CPU —
CopyTo/CopyToCPU/GetAsArray1DthrowNotSupportedException. UseCopyToHostAsync. GPU→GPUCopyFromworks.
For multi-worker mode, the page must be cross-origin isolated (COOP/COEP headers). The demo includes coi-serviceworker.js which handles this automatically. Without SharedArrayBuffer, the Wasm backend falls back to a single off-thread worker.
All modern browsers that support Blazor WebAssembly.
When running outside the browser (console apps, WPF, ASP.NET, etc.), SpawnDev.ILGPU uses ILGPU's native Cuda and OpenCL backends automatically. These are registered by the standard builder.AllAccelerators() call.
// Recommended: use the unified async pattern (same as Blazor WASM)
using var context = await Context.CreateAsync(builder => builder.AllAcceleratorsAsync());
using var accelerator = await context.CreatePreferredAcceleratorAsync();
// Lists all detected devices
foreach (var device in context)
Console.WriteLine($"{device.Name} ({device.AcceleratorType})");- Requires an NVIDIA GPU with a supported driver
- Uses PTX intermediate representation
- Best performance for NVIDIA hardware
- Full ILGPU feature support (shared memory, atomics, warp ops)
- Supports NVIDIA, AMD, and Intel GPUs
- Uses OpenCL C kernel language
- OpenCL 2.0+ and OpenCL 3.0 devices are supported
- NVIDIA GPUs with OpenCL 3.0 drivers are now compatible — the
GenericAddressSpacerequirement that previously blocked these devices has been relaxed - Subgroup-dependent tests (e.g.,
Warp.Shuffle) are dynamically skipped on devices that don't report subgroup support
Multi-threaded CPU accelerator using Parallel.For. Useful as a reference or for machines without GPU drivers. Full ILGPU feature support (shared memory, barriers, atomics). Not available in the browser — use the Wasm backend for off-main-thread compute in Blazor.
using ILGPU;
using ILGPU.Runtime.CPU;
using var context = Context.Create(b => b.CPU());
using var accelerator = context.CreateCPUAccelerator(0);Note: Cuda, OpenCL, and CPU are not available in Blazor WebAssembly — they are skipped silently when registering via
AllAcceleratorsAsync()in the browser.
GPU hardware typically supports only 32-bit operations. Both GPU backends provide software emulation for 64-bit types.
i64 emulation (long/ulong via vec2<u32>) is always enabled — ILGPU's IR requires Int64 for ArrayView.Length and indices.
f64 emulation (double) is configurable via F64EmulationMode:
| Dekker (Default) | Ozaki | Disabled | |
|---|---|---|---|
| Representation | vec2<f32> (high + low) |
vec4<f32> (quad-float) |
Native f32 |
| Precision | ~48–53 bits mantissa | Strict IEEE 754 | 32-bit only |
| Memory | 8 bytes | 16 bytes | 4 bytes |
| Performance | ⚡ Fast | 🐢 ~2× slower | ⚡⚡ Fastest |
| Best for | General compute, fractals | Scientific, financial | Rendering, max perf |
using SpawnDev.ILGPU;
// Default: Dekker double-float emulation
var options = new WebGPUBackendOptions();
// Strict IEEE 754 precision
var options = new WebGPUBackendOptions { F64Emulation = F64EmulationMode.Ozaki };
// Disable f64 emulation (double promoted to float, max performance)
var options = new WebGPUBackendOptions { F64Emulation = F64EmulationMode.Disabled };You can switch backends at runtime by disposing old resources and creating new ones:
// Dispose old resources
kernel = null;
outputBuffer?.Dispose();
accelerator?.Dispose();
context?.Dispose();
// Create new backend
context = await Context.CreateAsync(builder => builder.WebGL());
var devices = context.GetWebGLDevices();
accelerator = devices[0].CreateAccelerator(context);
// Reload kernel on new accelerator
kernel = accelerator.LoadAutoGroupedStreamKernel<...>(MyKernel);
outputBuffer = accelerator.Allocate1D<float>(length);Important: Kernels, buffers, and other resources are tied to their accelerator. You must recreate everything when switching backends.
Enable debug logging per-backend:
using SpawnDev.ILGPU.WebGPU.Backend;
using SpawnDev.ILGPU.WebGL.Backend;
using SpawnDev.ILGPU.Wasm.Backend;
WebGPUBackend.VerboseLogging = true; // WebGPU
WebGLBackend.VerboseLogging = true; // WebGL
WasmBackend.VerboseLogging = true; // WasmThis outputs compiled shader source, buffer binding details, and dispatch information to the browser console.
After loading a kernel the generated shader source is captured automatically:
using SpawnDev.ILGPU.WebGPU;
using SpawnDev.ILGPU.WebGL;
// Available immediately after LoadAutoGroupedStreamKernel / LoadStreamKernel
string? wgsl = WebGPUAccelerator.LastGeneratedWGSL; // WebGPU backend
string? glsl = WebGLAccelerator.LastGeneratedGLSL; // WebGL backendBoth properties are static and updated on every kernel load (not just on dispatch), so they always reflect the most recently compiled shader regardless of whether the kernel has been launched yet.