v1.9.0 #951
shi-eric
announced in
Announcements
v1.9.0
#951
Replies: 1 comment
-
Amazing news. Looking forward to testing it in my new project. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Warp 1.9 ships with a rewritten marching cubes implementation, compatibility with the CUDA 13 toolkit, and new functions for ahead-of-time module compilation. The programming model has also been enhanced with more flexible indexing for composite types, direct
IntEnum
support, and the ability to initialize local arrays in kernels.New Features
Differentiable marching cubes
A fully differentiable
wp.MarchingCubes
implementation, contributed by @mikacuy and @nmwsharp, has been added. This version is written entirely in Warp, replacing the previous native CUDA C++ implementation and enabling it to run on both CPU and GPU devices. The implementation also addresses a long-standing off-by-one bug (#324). For more details, see the updated documentation.Functions for module compilation and loading
We have added
wp.compile_aot_module()
andwp.load_aot_module()
for more flexible ahead-of-time (AOT) compilation.These functions include a
strip_hash=True
argument, which removes the unique hashes from compiled module and functionnames. This change makes it possible to distribute pre-compiled modules without shipping the original Python source code.
See the documentation on ahead-of-time compilation workflows for more details. In future releases, we plan to continue to expand Warp's support for ahead-of-time workflows.
CUDA 13 Support
CUDA Toolkit 13.0 was released in early August.
PyPI Distribution: Warp wheels on PyPI and NVIDIA PyPI will continue to be built with CUDA 12.8 to provide a transition period for users upgrading their CUDA drivers.
CUDA 13.0 Compatibility: Users requiring Warp compiled against CUDA 13.x have two options:
Driver Compatibility: CUDA 12.8 Warp wheels can run on systems with CUDA 13.x drivers thanks to CUDA's backward compatibility.
Performance Improvements
Graph-capturable linear solvers
The iterative linear solvers in
warp.optim.linear
(CG, BiCGSTAB, GMRES) are now fully compatible with CUDA graph capture. This adds support for device-side convergence checking viawp.capture_while()
, enabling full CUDA graph capture whencheck_every=0
. Users can now choose between traditional host-side convergence checks or fully graph-capturable device-side termination.Automatic tiling for sparse linear algebra
warp.sparse
now supports arbitrary-sized blocks and can leverage tile-based computations for certain matrix types. The system automatically chooses between tiled and non-tiled execution using heuristics based on matrix characteristics (block sizes, sparsity patterns, and workload dimensions). Note that the heuristic for choosing between tiled and non-tiled variants is still being refined, and that it can be manually overridden by providing thetile_size
parameter tobsr_mm
orbsr_mv
.Automatic tiling for finite element quadrature
warp.fem.integrate
now leverages tile-based computations for quadrature point accumulation, with automatic tile size selection based on workload characteristics. The system automatically chooses between tiled and non-tiled execution to optimize performance based on the integration problem size and complexity.Programming Model Updates
Slice and negative indexing improvements for composite types
We have enhanced the support for slice operations and negative indexing across all composite types (vectors, matrices, quaternions, and transforms).
Support for
IntEnum
andIntFlag
inside kernelsIt is now possible to directly reference
IntEnum
andIntFlag
values inside Warp functions and kernels. Previously, workarounds involvingwp.static()
were required.Improved support for
wp.array()
views inside kernelsThis enhancement allows kernels to create array views by accessing the
ptr
attribute of an array.Additionally, these in-kernel views now support dynamic shapes and struct types.
Support for initializing fixed-size arrays inside kernels
It is now possible to allocate local arrays of a fixed size in kernels using
wp.zeros()
. The resulting arrays are allocated in registers, providing fast access and avoiding global memory overhead.Previously, developers needed to create vectors to achieve a similar capability, e.g.
v = wp.vector(length=8, dtype=float)
, but this came with various limitations.Indexed tile operations
Warp now provides three new indexed tile operations that enable more flexible memory access patterns beyond simple contiguous tile operations. These functions allow you to load, store, and perform atomic operations on tiles using custom index mappings along specified axes.
wp.tile_load_indexed()
- Load tiles with custom index mapping along a specified axiswp.tile_store_indexed()
- Store tiles with custom index mapping along a specified axiswp.tile_atomic_add_indexed()
- Perform atomic additions with custom index mapping along a specified axisFixed nested matrix component support
Warp now properly supports writing to individual matrix elements stored within struct fields. Previously, operations like
struct.matrix[1, 2] = value
would result in a compile-time error.Announcements
Known limitations
Early testing on NVIDIA Jetson Thor indicates that launching CPU kernels may sometimes result in segmentation faults. GPU kernel launches are unaffected. We believe this can be resolved by building Warp from source against LLVM/Clang version 18 or newer.
Upcoming removals
The following features have been deprecated in prior releases and will be removed in v1.10 (early November):
warp.sim
- Use the Newton engine.wp.matrix()
from column vectors - Usewp.matrix_from_rows()
orwp.matrix_from_cols()
instead.wp.select()
- Usewp.where()
instead (note: different argument order).wp.matrix(pos, quat, scale)
- Usewp.transform_compose()
instead.Platform support
Acknowledgments
We thank the following contributors for their valuable contributions to this release:
warp.jax_experimental.ffi.jax_callable()
with a function annotated with the-> None
return type (#893).(#888).
strip_hash=True
option for the newahead-of-time compilation functions (#661).
Full Changelog
For a curated list of all changes in this release, please see the v1.9.0 section in CHANGELOG.md.
This discussion was created from the release v1.9.0.
Beta Was this translation helpful? Give feedback.
All reactions