Skip to content

Releases: NVIDIA/warp

v1.2.0

07 Jun 03:53
Compare
Choose a tag to compare

[1.2.0] - 2024-06-06

  • Add a not-a-number floating-point constant that can be used as wp.NAN or wp.nan.
  • Add wp.isnan(), wp.isinf(), and wp.isfinite() for scalars, vectors, matrices, etc.
  • Improve kernel cache reuse by hashing just the local module constants. Previously, a
    module's hash was affected by all wp.constant() variables declared in a Warp program.
  • Revised module compilation process to allow multiple processes to use the same kernel cache directory.
    Cached kernels will now be stored in hash-specific subdirectory.
  • Add runtime checks for wp.MarchingCubes on field dimensions and size
  • Fix memory leak in wp.Mesh BVH (GH-225)
  • Use C++17 when building the Warp library and user kernels
  • Increase PTX target architecture up to sm_75 (from sm_70), enabling Turing ISA features
  • Extended NanoVDB support (see warp.Volume):
    • Add support for data-agnostic index grids, allocation at voxel granularity
    • New wp.volume_lookup_index(), wp.volume_sample_index() and generic wp.volume_sample()/wp.volume_lookup()/wp.volume_store() kernel-level functions
    • Zero-copy aliasing of in-memory grids, support for multi-grid buffers
    • Grid introspection and blind data access capabilities
    • warp.fem can now work directly on NanoVDB grids using warp.fem.Nanogrid
    • Fixed wp.volume_sample_v() and wp.volume_store_*() adjoints
    • Prevent wp.volume_store() from overwriting grid background values
  • Improve validation of user-provided fields and values in warp.fem
  • Support headless rendering of wp.render.OpenGLRenderer via pyglet.options["headless"] = True
  • wp.render.RegisteredGLBuffer can fall back to CPU-bound copying if CUDA/OpenGL interop is not available
  • Clarify terms for external contributions, please see CONTRIBUTING.md for details
  • Improve performance of wp.sparse.bsr_mm() by ~5x on benchmark problems
  • Fix for XPBD incorrectly indexing into of joint actuations joint_act arrays
  • Fix for mass matrix gradients computation in wp.sim.FeatherstoneIntegrator()
  • Fix for handling of --msvc_path in build scripts
  • Fix for wp.copy() params to record dest and src offset parameters on wp.Tape()
  • Fix for wp.randn() to ensure return values are finite
  • Fix for slicing of arrays with gradients in kernels
  • Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
  • Fix for handling of bool types in generic kernels
  • Publish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details

[1.1.1] - 2024-05-24

  • wp.init() is no longer required to be called explicitly and will be performed on first call to the API
  • Speed up omni.warp.core's startup time

v1.1.0

08 May 15:54
Compare
Choose a tag to compare

[1.1.0] - 2024-05-09

  • Support returning a value from @wp.func_native CUDA functions using type hints
  • Improved differentiability of the wp.sim.FeatherstoneIntegrator
  • Fix gradient propagation for rigid body contacts in wp.sim.collide()
  • Added support for event-based timing, see wp.ScopedTimer()
  • Added Tape visualization and debugging functions, see wp.Tape.visualize()
  • Support constructing Warp arrays from objects that define the __cuda_array_interface__ attribute
  • Support copying a struct to another device, use struct.to(device) to migrate struct arrays
  • Allow rigid shapes to not have any collisions with other shapes in wp.sim.Model
  • Change default test behavior to test redundant GPUs (up to 2x)
  • Test each example in an individual subprocess
  • Polish and optimize various examples and tests
  • Allow non-contiguous point arrays to be passed to wp.HashGrid.build()
  • Upgrade LLVM to 18.1.3 for from-source builds and Linux x86-64 builds
  • Build DLL source code as C++17 and require GCC 9.4 as a minimum
  • Array clone, assign, and copy are now differentiable
  • Use Ruff for formatting and linting
  • Various documentation improvements (infinity, math constants, etc.)
  • Improve URDF importer, handle joint armature
  • Allow builtins.bool to be used in Warp data structures
  • Use external gradient arrays in backward passes when passed to wp.launch()
  • Add Conjugate Residual linear solver, see wp.optim.linear.cr()
  • Fix propagation of gradients on aliased copy of variables in kernels
  • Facilitate debugging and speed up import warp by eliminating raising any exceptions
  • Improve support for nested vec/mat assignments in structs
  • Recommend Python 3.9 or higher, which is required for JAX and soon PyTorch.
  • Support gradient propagation for indexing sliced multi-dimensional arrays, i.e. a[i][j] vs. a[i, j]
  • Provide an informative message if setting DLL C-types failed, instructing to try rebuilding the library

[1.0.3] - 2024-04-17

  • Add a support_level entry to the configuration file of the extensions

v1.0.2

22 Mar 20:42
Compare
Choose a tag to compare

[1.0.2] - 2024-03-22

  • Make examples runnable from any location
  • Fix the examples not running directly from their Python file
  • Add the example gallery to the documentation
  • Update README.md examples USD location
  • Update example_graph_capture.py description

v1.0.1

15 Mar 17:32
Compare
Choose a tag to compare

[1.0.1] - 2024-03-15

  • Document Device total_memory and free_memory
  • Documentation for allocators, streams, peer access, and generics
  • Changed example output directory to current working directory
  • Added python -m warp.examples.browse for browsing the examples folder
  • Print where the USD stage file is being saved
  • Added examples/optim/example_walker.py sample
  • Make the drone example not specific to USD
  • Reduce the time taken to run some examples
  • Optimise rendering points with a single colour
  • Clarify an error message around needing USD
  • Raise exception when module is unloaded during graph capture
  • Added wp.synchronize_event() for blocking the host thread until a recorded event completes
  • Flush C print buffers when ending stdout capture
  • Remove more unneeded CUTLASS files
  • Allow setting mempool release threshold as a fractional value

v1.0.0

08 Mar 01:58
Compare
Choose a tag to compare

[1.0.0] - 2024-03-07

  • Add FeatherstoneIntegrator which provides more stable simulation of articulated rigid body dynamics in generalized coordinates (State.joint_q and State.joint_qd)
  • Introduce warp.sim.Control struct to store control inputs for simulations (optional, by default the Model control inputs are used as before); integrators now have a different simulation signature: integrator.simulate(model: Model, state_in: State, state_out: State, dt: float, control: Control)
  • joint_act can now behave in 3 modes: with joint_axis_mode set to JOINT_MODE_FORCE it behaves as a force/torque, with JOINT_MODE_VELOCITY it behaves as a velocity target, and with JOINT_MODE_POSITION it behaves as a position target; joint_target has been removed
  • Add adhesive contact to Euler integrators via Model.shape_materials.ka which controls the contact distance at which the adhesive force is applied
  • Improve handling of visual/collision shapes in URDF importer so visual shapes are not involved in contact dynamics
  • Experimental JAX kernel callback support
  • Improve module load exception message
  • Add wp.ScopedCapture
  • Removing enable_backward warning for callables
  • Copy docstrings and annotations from wrapped kernels, functions, structs

v0.15.1

06 Mar 02:49
Compare
Choose a tag to compare

[0.15.1] - 2024-03-05

  • Add examples assets to the wheel packages
  • Fix broken image link in documentation
  • Fix codegen for custom grad functions calling their respective forward functions
  • Fix custom grad function handling for functions that have no outputs
  • Fix issues when wp.config.quiet = True

v0.15.0

05 Mar 05:06
Compare
Choose a tag to compare

[0.15.0] - 2024-03-04

  • Add thumbnails to examples gallery
  • Apply colored lighting to examples
  • Moved examples directory under warp/
  • Add example usage to python -m warp.tests --help
  • Adding torch.autograd.function example + docs
  • Add error-checking to array shapes during creation
  • Adding example_graph_capture
  • Add a Diffsim Example of a Drone
  • Fix verify_fp causing compiler errors and support CPU kernels
  • Fix to enable matmul to be called in CUDA graph capture
  • Enable mempools by default
  • Update wp.launch to support tuple args
  • Fix BiCGSTAB and GMRES producing NaNs when converging early
  • Fix warning about backward codegen being disabled in test_fem
  • Fix assert_np_equal when NaN's and tolerance are involved
  • Improve error message to discern between CUDA being disabled or not supported
  • Support cross-module functions with user-defined gradients
  • Suppress superfluous CUDA error when ending capture after errors
  • Make output during initialization atomic
  • Add warp.config.max_unroll, fix custom gradient unrolling
  • Support native replay snippets using @wp.func_native(snippet, replay_snippet=replay_snippet)
  • Look for the CUDA Toolkit in default locations if the CUDA_PATH environment variable or --cuda_path build option are not used
  • Added wp.ones() to efficiently create one-initialized arrays
  • Rename wp.config.graph_capture_module_load_default to wp.config.enable_graph_capture_module_load_by_default

[0.14.0] - 2024-02-19

  • Add support for CUDA pooled (stream-ordered) allocators
    • Support memory allocation during graph capture
    • Support copying non-contiguous CUDA arrays during graph capture
    • Improved memory allocation/deallocation performance with pooled allocators
    • Use wp.config.enable_mempools_at_init to enable pooled allocators during Warp initialization (if supported)
    • wp.is_mempool_supported() - check if a device supports pooled allocators
    • wp.is_mempool_enabled(), wp.set_mempool_enabled() - enable or disable pooled allocators per device
    • wp.set_mempool_release_threshold(), wp.get_mempool_release_threshold() - configure memory pool release threshold
  • Add support for direct memory access between devices
    • Improved peer-to-peer memory transfer performance if access is enabled
    • Caveat: enabling peer access may impact memory allocation/deallocation performance and increase memory consumption
    • wp.is_peer_access_supported() - check if the memory of a device can be accessed by a peer device
    • wp.is_peer_access_enabled(), wp.set_peer_access_enabled() - manage peer access for memory allocated using default CUDA allocators
    • wp.is_mempool_access_supported() - check if the memory pool of a device can be accessed by a peer device
    • wp.is_mempool_access_enabled(), wp.set_mempool_access_enabled() - manage access for memory allocated using pooled CUDA allocators
  • Refined stream synchronization semantics
    • wp.ScopedStream can synchronize with the previous stream on entry and/or exit (only sync on entry by default)
    • Functions taking an optional stream argument do no implicit synchronization for max performance (e.g., wp.copy(), wp.launch(), wp.capture_launch())
  • Support for passing a custom deleter argument when constructing arrays
    • Deprecation of owner argument - use deleter to transfer ownership
  • Optimizations for various core API functions (e.g., wp.zeros(), wp.full(), and more)
  • Fix wp.matmul() to always use the correct CUDA context
  • Fix memory leak in BSR transpose
  • Fix stream synchronization issues when copying non-contiguous arrays

[0.13.1] - 2024-02-22

  • Ensure that the results from the Noise Deform are deterministic across different Kit sessions

v0.13.0

16 Feb 23:42
Compare
Choose a tag to compare

[0.13.0] - 2024-02-16

  • Update the license to NVIDIA Software License, allowing commercial use (see LICENSE.md)
  • Add CONTRIBUTING.md guidelines (for NVIDIA employees)
  • Hash CUDA snippet and adj_snippet strings to fix caching
  • Fix build_docs.py on Windows
  • Add missing .py extension to warp/tests/walkthrough_debug
  • Allow wp.bool usage in vector and matrix types

[0.12.0] - 2024-02-05

  • Add a warning when the enable_backward setting is set to False upon calling wp.Tape.backward()
  • Fix kernels not being recompiled as expected when defined using a closure
  • Change the kernel cache appauthor subdirectory to just "NVIDIA"
  • Ensure that gradients attached to PyTorch tensors have compatible strides when calling wp.from_torch()
  • Add a Noise Deform node for OmniGraph that deforms points using a perlin/curl noise

v0.11.0

23 Jan 21:39
Compare
Choose a tag to compare

[0.11.0] - 2024-01-23

  • Re-release 1.0.0-beta.7 as a non-pre-release 0.11.0 version so it gets selected by pip install warp-lang.
  • Introducing a new versioning and release process, detailed in PACKAGING.md and resembling that of Python itself:
    • The 0.11 release(s) can be found on the release-0.11 branch.
    • Point releases (if any) go on the same minor release branch and only contain bug fixes, not new features.
    • The public branch, previously used to merge releases into and corresponding with the GitHub main branch, is retired.

[1.0.0-beta.7] - 2024-01-23

  • Ensure captures are always enclosed in try/finally
  • Only include .py files from the warp subdirectory into wheel packages
  • Fix an extension's sample node failing at parsing some version numbers
  • Allow examples to run without USD when possible
  • Add a setting to disable the main Warp menu in Kit
  • Add iterative linear solvers, see wp.optim.linear.cg, wp.optim.linear.bicgstab, wp.optim.linear.gmres, and wp.optim.linear.LinearOperator
  • Improve error messages around global variables
  • Improve error messages around mat/vec assignments
  • Support conversion of scalars to native/ctypes, e.g.: float(wp.float32(1.23)) or ctypes.c_float(wp.float32(1.23))
  • Add a constant for infinity, see wp.inf
  • Add a FAQ entry about array assignments
  • Add a mass spring cage diff simulation example, see examples/example_diffsim_mass_spring_cage.py
  • Add -s, --suite option for only running tests belonging to the given suites
  • Fix common spelling mistakes
  • Fix indentation of generated code
  • Show deprecation warnings only once
  • Improve wp.render.OpenGLRenderer
  • Create the extension's symlink to the core library at runtime
  • Fix some built-ins failing to compile the backward pass when nested inside if/else blocks
  • Update examples with the new variants of the mesh query built-ins
  • Fix type members that weren't zero-initialized
  • Fix missing adjoint function for wp.mesh_query_ray()

v1.0.0-beta.6

10 Jan 21:44
Compare
Choose a tag to compare
v1.0.0-beta.6 Pre-release
Pre-release

[1.0.0-beta.6] - 2024-01-10

  • Do not create CPU copy of grad array when calling array.numpy()
  • Fix assert_np_equal() bug
  • Support Linux AArch64 platforms, including Jetson/Tegra devices
  • Add parallel testing runner (invoke with python -m warp.tests, use warp/tests/unittest_serial.py for serial testing)
  • Fix support for function calls in range()
  • matmul adjoints now accumulate
  • Expand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
  • Fix multi-gpu synchronization issue in sparse.py
  • Add depth rendering to OpenGLRenderer, document warp.render
  • Make atomic_min, atomic_max differentiable
  • Fix error reporting using the exact source segment
  • Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
  • Address multiple differentiability issues
  • Fix backpropagation for returning array element references
  • Support passing the return value to adjoints
  • Add point basis space and explicit point-based quadrature for warp.fem
  • Support overriding the LLVM project source directory path using build_lib.py --build_llvm --llvm_source_path=
  • Fix the error message for accessing non-existing attributes
  • Flatten faces array for Mesh constructor in URDF parser