Releases: NVIDIA/warp
Releases · NVIDIA/warp
v1.2.0
[1.2.0] - 2024-06-06
- Add a not-a-number floating-point constant that can be used as
wp.NAN
orwp.nan
. - Add
wp.isnan()
,wp.isinf()
, andwp.isfinite()
for scalars, vectors, matrices, etc. - Improve kernel cache reuse by hashing just the local module constants. Previously, a
module's hash was affected by allwp.constant()
variables declared in a Warp program. - Revised module compilation process to allow multiple processes to use the same kernel cache directory.
Cached kernels will now be stored in hash-specific subdirectory. - Add runtime checks for
wp.MarchingCubes
on field dimensions and size - Fix memory leak in
wp.Mesh
BVH (GH-225) - Use C++17 when building the Warp library and user kernels
- Increase PTX target architecture up to
sm_75
(fromsm_70
), enabling Turing ISA features - Extended NanoVDB support (see
warp.Volume
):- Add support for data-agnostic index grids, allocation at voxel granularity
- New
wp.volume_lookup_index()
,wp.volume_sample_index()
and genericwp.volume_sample()
/wp.volume_lookup()
/wp.volume_store()
kernel-level functions - Zero-copy aliasing of in-memory grids, support for multi-grid buffers
- Grid introspection and blind data access capabilities
warp.fem
can now work directly on NanoVDB grids usingwarp.fem.Nanogrid
- Fixed
wp.volume_sample_v()
andwp.volume_store_*()
adjoints - Prevent
wp.volume_store()
from overwriting grid background values
- Improve validation of user-provided fields and values in
warp.fem
- Support headless rendering of
wp.render.OpenGLRenderer
viapyglet.options["headless"] = True
wp.render.RegisteredGLBuffer
can fall back to CPU-bound copying if CUDA/OpenGL interop is not available- Clarify terms for external contributions, please see CONTRIBUTING.md for details
- Improve performance of
wp.sparse.bsr_mm()
by ~5x on benchmark problems - Fix for XPBD incorrectly indexing into of joint actuations
joint_act
arrays - Fix for mass matrix gradients computation in
wp.sim.FeatherstoneIntegrator()
- Fix for handling of
--msvc_path
in build scripts - Fix for
wp.copy()
params to record dest and src offset parameters onwp.Tape()
- Fix for
wp.randn()
to ensure return values are finite - Fix for slicing of arrays with gradients in kernels
- Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
- Fix for handling of
bool
types in generic kernels - Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details
[1.1.1] - 2024-05-24
wp.init()
is no longer required to be called explicitly and will be performed on first call to the API- Speed up
omni.warp.core
's startup time
v1.1.0
[1.1.0] - 2024-05-09
- Support returning a value from
@wp.func_native
CUDA functions using type hints - Improved differentiability of the
wp.sim.FeatherstoneIntegrator
- Fix gradient propagation for rigid body contacts in
wp.sim.collide()
- Added support for event-based timing, see
wp.ScopedTimer()
- Added Tape visualization and debugging functions, see
wp.Tape.visualize()
- Support constructing Warp arrays from objects that define the
__cuda_array_interface__
attribute - Support copying a struct to another device, use
struct.to(device)
to migrate struct arrays - Allow rigid shapes to not have any collisions with other shapes in
wp.sim.Model
- Change default test behavior to test redundant GPUs (up to 2x)
- Test each example in an individual subprocess
- Polish and optimize various examples and tests
- Allow non-contiguous point arrays to be passed to
wp.HashGrid.build()
- Upgrade LLVM to 18.1.3 for from-source builds and Linux x86-64 builds
- Build DLL source code as C++17 and require GCC 9.4 as a minimum
- Array clone, assign, and copy are now differentiable
- Use
Ruff
for formatting and linting - Various documentation improvements (infinity, math constants, etc.)
- Improve URDF importer, handle joint armature
- Allow builtins.bool to be used in Warp data structures
- Use external gradient arrays in backward passes when passed to
wp.launch()
- Add Conjugate Residual linear solver, see
wp.optim.linear.cr()
- Fix propagation of gradients on aliased copy of variables in kernels
- Facilitate debugging and speed up
import warp
by eliminating raising any exceptions - Improve support for nested vec/mat assignments in structs
- Recommend Python 3.9 or higher, which is required for JAX and soon PyTorch.
- Support gradient propagation for indexing sliced multi-dimensional arrays, i.e.
a[i][j]
vs.a[i, j]
- Provide an informative message if setting DLL C-types failed, instructing to try rebuilding the library
[1.0.3] - 2024-04-17
- Add a
support_level
entry to the configuration file of the extensions
v1.0.2
v1.0.1
[1.0.1] - 2024-03-15
- Document Device
total_memory
andfree_memory
- Documentation for allocators, streams, peer access, and generics
- Changed example output directory to current working directory
- Added
python -m warp.examples.browse
for browsing the examples folder - Print where the USD stage file is being saved
- Added
examples/optim/example_walker.py
sample - Make the drone example not specific to USD
- Reduce the time taken to run some examples
- Optimise rendering points with a single colour
- Clarify an error message around needing USD
- Raise exception when module is unloaded during graph capture
- Added
wp.synchronize_event()
for blocking the host thread until a recorded event completes - Flush C print buffers when ending
stdout
capture - Remove more unneeded CUTLASS files
- Allow setting mempool release threshold as a fractional value
v1.0.0
[1.0.0] - 2024-03-07
- Add
FeatherstoneIntegrator
which provides more stable simulation of articulated rigid body dynamics in generalized coordinates (State.joint_q
andState.joint_qd
) - Introduce
warp.sim.Control
struct to store control inputs for simulations (optional, by default theModel
control inputs are used as before); integrators now have a different simulation signature:integrator.simulate(model: Model, state_in: State, state_out: State, dt: float, control: Control)
joint_act
can now behave in 3 modes: withjoint_axis_mode
set toJOINT_MODE_FORCE
it behaves as a force/torque, withJOINT_MODE_VELOCITY
it behaves as a velocity target, and withJOINT_MODE_POSITION
it behaves as a position target;joint_target
has been removed- Add adhesive contact to Euler integrators via
Model.shape_materials.ka
which controls the contact distance at which the adhesive force is applied - Improve handling of visual/collision shapes in URDF importer so visual shapes are not involved in contact dynamics
- Experimental JAX kernel callback support
- Improve module load exception message
- Add
wp.ScopedCapture
- Removing
enable_backward
warning for callables - Copy docstrings and annotations from wrapped kernels, functions, structs
v0.15.1
[0.15.1] - 2024-03-05
- Add examples assets to the wheel packages
- Fix broken image link in documentation
- Fix codegen for custom grad functions calling their respective forward functions
- Fix custom grad function handling for functions that have no outputs
- Fix issues when
wp.config.quiet = True
v0.15.0
[0.15.0] - 2024-03-04
- Add thumbnails to examples gallery
- Apply colored lighting to examples
- Moved
examples
directory underwarp/
- Add example usage to
python -m warp.tests --help
- Adding
torch.autograd.function
example + docs - Add error-checking to array shapes during creation
- Adding
example_graph_capture
- Add a Diffsim Example of a Drone
- Fix
verify_fp
causing compiler errors and support CPU kernels - Fix to enable
matmul
to be called in CUDA graph capture - Enable mempools by default
- Update
wp.launch
to support tuple args - Fix BiCGSTAB and GMRES producing NaNs when converging early
- Fix warning about backward codegen being disabled in
test_fem
- Fix
assert_np_equal
when NaN's and tolerance are involved - Improve error message to discern between CUDA being disabled or not supported
- Support cross-module functions with user-defined gradients
- Suppress superfluous CUDA error when ending capture after errors
- Make output during initialization atomic
- Add
warp.config.max_unroll
, fix custom gradient unrolling - Support native replay snippets using
@wp.func_native(snippet, replay_snippet=replay_snippet)
- Look for the CUDA Toolkit in default locations if the
CUDA_PATH
environment variable or--cuda_path
build option are not used - Added
wp.ones()
to efficiently create one-initialized arrays - Rename
wp.config.graph_capture_module_load_default
towp.config.enable_graph_capture_module_load_by_default
[0.14.0] - 2024-02-19
- Add support for CUDA pooled (stream-ordered) allocators
- Support memory allocation during graph capture
- Support copying non-contiguous CUDA arrays during graph capture
- Improved memory allocation/deallocation performance with pooled allocators
- Use
wp.config.enable_mempools_at_init
to enable pooled allocators during Warp initialization (if supported) wp.is_mempool_supported()
- check if a device supports pooled allocatorswp.is_mempool_enabled()
,wp.set_mempool_enabled()
- enable or disable pooled allocators per devicewp.set_mempool_release_threshold()
,wp.get_mempool_release_threshold()
- configure memory pool release threshold
- Add support for direct memory access between devices
- Improved peer-to-peer memory transfer performance if access is enabled
- Caveat: enabling peer access may impact memory allocation/deallocation performance and increase memory consumption
wp.is_peer_access_supported()
- check if the memory of a device can be accessed by a peer devicewp.is_peer_access_enabled()
,wp.set_peer_access_enabled()
- manage peer access for memory allocated using default CUDA allocatorswp.is_mempool_access_supported()
- check if the memory pool of a device can be accessed by a peer devicewp.is_mempool_access_enabled()
,wp.set_mempool_access_enabled()
- manage access for memory allocated using pooled CUDA allocators
- Refined stream synchronization semantics
wp.ScopedStream
can synchronize with the previous stream on entry and/or exit (only sync on entry by default)- Functions taking an optional stream argument do no implicit synchronization for max performance (e.g.,
wp.copy()
,wp.launch()
,wp.capture_launch()
)
- Support for passing a custom
deleter
argument when constructing arrays- Deprecation of
owner
argument - usedeleter
to transfer ownership
- Deprecation of
- Optimizations for various core API functions (e.g.,
wp.zeros()
,wp.full()
, and more) - Fix
wp.matmul()
to always use the correct CUDA context - Fix memory leak in BSR transpose
- Fix stream synchronization issues when copying non-contiguous arrays
[0.13.1] - 2024-02-22
- Ensure that the results from the
Noise Deform
are deterministic across different Kit sessions
v0.13.0
[0.13.0] - 2024-02-16
- Update the license to NVIDIA Software License, allowing commercial use (see
LICENSE.md
) - Add
CONTRIBUTING.md
guidelines (for NVIDIA employees) - Hash CUDA
snippet
andadj_snippet
strings to fix caching - Fix
build_docs.py
on Windows - Add missing
.py
extension towarp/tests/walkthrough_debug
- Allow
wp.bool
usage in vector and matrix types
[0.12.0] - 2024-02-05
- Add a warning when the
enable_backward
setting is set toFalse
upon callingwp.Tape.backward()
- Fix kernels not being recompiled as expected when defined using a closure
- Change the kernel cache appauthor subdirectory to just "NVIDIA"
- Ensure that gradients attached to PyTorch tensors have compatible strides when calling
wp.from_torch()
- Add a
Noise Deform
node for OmniGraph that deforms points using a perlin/curl noise
v0.11.0
[0.11.0] - 2024-01-23
- Re-release 1.0.0-beta.7 as a non-pre-release 0.11.0 version so it gets selected by
pip install warp-lang
. - Introducing a new versioning and release process, detailed in
PACKAGING.md
and resembling that of Python itself:- The 0.11 release(s) can be found on the
release-0.11
branch. - Point releases (if any) go on the same minor release branch and only contain bug fixes, not new features.
- The
public
branch, previously used to merge releases into and corresponding with the GitHubmain
branch, is retired.
- The 0.11 release(s) can be found on the
[1.0.0-beta.7] - 2024-01-23
- Ensure captures are always enclosed in
try
/finally
- Only include .py files from the warp subdirectory into wheel packages
- Fix an extension's sample node failing at parsing some version numbers
- Allow examples to run without USD when possible
- Add a setting to disable the main Warp menu in Kit
- Add iterative linear solvers, see
wp.optim.linear.cg
,wp.optim.linear.bicgstab
,wp.optim.linear.gmres
, andwp.optim.linear.LinearOperator
- Improve error messages around global variables
- Improve error messages around mat/vec assignments
- Support conversion of scalars to native/ctypes, e.g.:
float(wp.float32(1.23))
orctypes.c_float(wp.float32(1.23))
- Add a constant for infinity, see
wp.inf
- Add a FAQ entry about array assignments
- Add a mass spring cage diff simulation example, see
examples/example_diffsim_mass_spring_cage.py
- Add
-s
,--suite
option for only running tests belonging to the given suites - Fix common spelling mistakes
- Fix indentation of generated code
- Show deprecation warnings only once
- Improve
wp.render.OpenGLRenderer
- Create the extension's symlink to the core library at runtime
- Fix some built-ins failing to compile the backward pass when nested inside if/else blocks
- Update examples with the new variants of the mesh query built-ins
- Fix type members that weren't zero-initialized
- Fix missing adjoint function for
wp.mesh_query_ray()
v1.0.0-beta.6
[1.0.0-beta.6] - 2024-01-10
- Do not create CPU copy of grad array when calling
array.numpy()
- Fix
assert_np_equal()
bug - Support Linux AArch64 platforms, including Jetson/Tegra devices
- Add parallel testing runner (invoke with
python -m warp.tests
, usewarp/tests/unittest_serial.py
for serial testing) - Fix support for function calls in
range()
matmul
adjoints now accumulate- Expand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
- Fix multi-gpu synchronization issue in
sparse.py
- Add depth rendering to
OpenGLRenderer
, documentwarp.render
- Make
atomic_min
,atomic_max
differentiable - Fix error reporting using the exact source segment
- Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
- Address multiple differentiability issues
- Fix backpropagation for returning array element references
- Support passing the return value to adjoints
- Add point basis space and explicit point-based quadrature for
warp.fem
- Support overriding the LLVM project source directory path using
build_lib.py --build_llvm --llvm_source_path=
- Fix the error message for accessing non-existing attributes
- Flatten faces array for Mesh constructor in URDF parser