Add graph-capturable rebuildable NanoVDB volumes#1567
Conversation
Keep volume rebuilds capturable by moving rebuild counts and capacity checks onto CUDA while preserving legacy NanoVDB ordering for valued and index grids. Add parity coverage for valued and index setup paths, plus a benchmark for legacy allocation, rebuildable setup, direct rebuild, and graph replay. Signed-off-by: Gilles Daviet <gdaviet@nvidia.com>
Use padded BSR for captured APIC Use padded BSR assembly for the captured APIC velocity projector so capacity-sized rebuild grids do not compact over oversized topology. Signed-off-by: Gilles Daviet <gdaviet@nvidia.com>
📝 WalkthroughWait, I need to make sure every range appears exactly once. Let me fix Hmm, I have WalkthroughAdds rebuildable NanoVDB volumes (tiles and active voxels) with capacity/status controls, point masks, CPU and CUDA implementations, and CUDA-graph-capturable rebuilds. Extends ChangesRebuildable NanoVDB Volume Feature
Sequence Diagram(s)sequenceDiagram
participant example_apic_fluid
participant Nanogrid_rebuild
participant Volume_rebuild
participant wp_volume_rebuild_from_tiles_device
participant solve_incompressibility_cr
participant wp_capture_launch
rect rgba(100, 150, 255, 0.5)
note over example_apic_fluid: Initialization (once)
example_apic_fluid->>Nanogrid_rebuild: allocate rebuildable Volume + Nanogrid
example_apic_fluid->>wp_capture_launch: wp.ScopedCapture → capture simulate(capturable=True)
end
rect rgba(100, 220, 150, 0.5)
note over example_apic_fluid: Per-step (graph replay)
example_apic_fluid->>wp_capture_launch: capture_launch(graph)
wp_capture_launch->>Nanogrid_rebuild: rebuild(particle_q, point_mask=pic_mask)
Nanogrid_rebuild->>Volume_rebuild: Volume.rebuild(points, point_mask, status)
Volume_rebuild->>wp_volume_rebuild_from_tiles_device: native CUDA rebuild
wp_volume_rebuild_from_tiles_device-->>Volume_rebuild: updated NanoVDB grid
Volume_rebuild-->>Nanogrid_rebuild: status
Nanogrid_rebuild->>Nanogrid_rebuild: rebuild_topology_from_cells()
Nanogrid_rebuild-->>example_apic_fluid: updated FEM grid
example_apic_fluid->>solve_incompressibility_cr: divergence_mat, projector, capturable=True
solve_incompressibility_cr-->>example_apic_fluid: pressure/velocity solution
example_apic_fluid->>example_apic_fluid: advect particles in-place
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~150 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR introduces rebuildable NanoVDB volumes that can have their topology recomputed in place — including inside CUDA graph captures — by pre-allocating a fixed-capacity buffer at construction time. It also adds a CPU NanoVDB build path, a
Confidence Score: 3/5Safe to merge for new rebuildable-volume code paths, but the behavioral change to get_voxel_count() / get_tiles() / get_voxels() on rebuildable volumes (capacity returned instead of active count, tail entries uninitialized) could silently break existing downstream code that passes a rebuildable volume to those methods without expecting over-sized output. The volume builder rewrite is large (~6 000 net lines changed) and replaces a third-party PointsToGrid implementation with a bespoke GPU/CPU pipeline. The core logic is carefully designed and well-tested. The main concern is that get_voxel_count() and its dependents now return the pre-allocated capacity — not the live active count — for rebuildable volumes, but the existing method docstrings were not updated to signal this change. Any caller that creates a rebuildable volume and feeds the get_voxels() result to another algorithm will silently receive an over-sized array with uninitialized data in the tail. warp/_src/types.py — get_voxel_count(), get_tile_count(), get_tiles(), and get_voxels() all need docstring updates for the rebuildable-capacity semantics. warp/native/volume_builder.cu and volume_builder_cpu.cpp are large new files that could benefit from a second native-layer review pass. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant User as Python caller
participant Vol as wp.Volume
participant Desc as VolumeDesc (C++)
participant Builder as volume_builder.cu/.cpp
participant NVDb as NanoVDB buffer (device)
User->>Vol: "allocate_by_voxels(points, rebuildable=True, max_active_voxels=N)"
Vol->>Builder: allocate_rebuildable_grid_from_active_voxels(N, status)
Builder->>NVDb: allocate fixed-capacity buffer
Builder->>NVDb: initial build (sort keys, write topology)
Builder-->>Vol: grid ptr + size
Vol->>Desc: volume_mark_rebuildable(id, capacities)
Desc-->>Vol: "VolumeDesc.rebuildable=true, capacities stored"
Note over User,NVDb: Later - may be inside CUDA graph capture
User->>Vol: "volume.rebuild(new_points, status=status_arr)"
Vol->>Builder: rebuild_grid_from_active_voxels_host/device(id, new_points)
Builder->>NVDb: zero existing topology in-place
Builder->>NVDb: scatter new keys into reserved capacity
Builder->>NVDb: update tree/root/bbox metadata
Builder-->>Vol: status flags written to device array
Vol-->>User: status array
Note over User,NVDb: Query actual (live) counts
User->>Vol: volume.get_active_stats()
Vol->>NVDb: D2H copy of grid+tree header
NVDb-->>Vol: live node_count_leaf / voxel_count
Vol-->>User: ActiveStats(voxel_count, leaf_node_count, ...)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant User as Python caller
participant Vol as wp.Volume
participant Desc as VolumeDesc (C++)
participant Builder as volume_builder.cu/.cpp
participant NVDb as NanoVDB buffer (device)
User->>Vol: "allocate_by_voxels(points, rebuildable=True, max_active_voxels=N)"
Vol->>Builder: allocate_rebuildable_grid_from_active_voxels(N, status)
Builder->>NVDb: allocate fixed-capacity buffer
Builder->>NVDb: initial build (sort keys, write topology)
Builder-->>Vol: grid ptr + size
Vol->>Desc: volume_mark_rebuildable(id, capacities)
Desc-->>Vol: "VolumeDesc.rebuildable=true, capacities stored"
Note over User,NVDb: Later - may be inside CUDA graph capture
User->>Vol: "volume.rebuild(new_points, status=status_arr)"
Vol->>Builder: rebuild_grid_from_active_voxels_host/device(id, new_points)
Builder->>NVDb: zero existing topology in-place
Builder->>NVDb: scatter new keys into reserved capacity
Builder->>NVDb: update tree/root/bbox metadata
Builder-->>Vol: status flags written to device array
Vol-->>User: status array
Note over User,NVDb: Query actual (live) counts
User->>Vol: volume.get_active_stats()
Vol->>NVDb: D2H copy of grid+tree header
NVDb-->>Vol: live node_count_leaf / voxel_count
Vol-->>User: ActiveStats(voxel_count, leaf_node_count, ...)
|
| None, self._cell_grid_info.translation, self._cell_grid_info.transform_matrix | ||
| ) | ||
| packed_points = _pack_environment_points( | ||
| points, | ||
| point_envs, | ||
| point_mask, | ||
| self._env_offsets, | ||
| inverse_transform, | ||
| translation_vec, | ||
| temporary_store=None, | ||
| device=self._cell_grid.device, | ||
| ) | ||
| rebuild_points = packed_points | ||
|
|
||
| status = self._cell_grid.rebuild(rebuild_points, status=status, point_mask=point_mask) | ||
| if point_envs is not None: | ||
| _fill_cell_env_from_points(self._cell_grid, rebuild_points, point_envs, point_mask, self._cell_env) | ||
| if packed_points is not None: | ||
| packed_points.release() | ||
|
|
||
| self.rebuild_topology_from_cells() |
There was a problem hiding this comment.
point_mask applied twice in multi-environment rebuild path
When point_envs is not None, _pack_environment_points is called with point_mask, which writes wp.vec3i(0) into packed_points at every masked position. The resulting packed_points is then passed to self._cell_grid.rebuild(..., point_mask=point_mask), so the same mask is checked a second time in the C++ rebuild — masked entries are skipped, and the placeholder (0,0,0) points are never inserted into the grid. This is correct behavior, but it makes the code harder to reason about: a reader must track that packed_points already has sentinel zeros for masked entries and that the C++ builder additionally ignores them. A comment explaining this intent would reduce the risk of a future maintainer removing one of the two masking layers thinking it is redundant.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
warp/_src/types.py (1)
6608-6611:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the
allocate()device-support docstring.
allocate_by_tiles()now supports CPU and CUDA andallocate()delegates to it, so this CUDA-only sentence is stale for the new CPU build path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@warp/_src/types.py` around lines 6608 - 6611, Update the docstring for the allocate() method to reflect its current device support. The statement "This function is only supported for CUDA devices" is outdated since allocate() now delegates to allocate_by_tiles() which supports both CPU and CUDA. Modify the docstring to indicate that the function supports both CPU and CUDA devices, removing or updating the CUDA-only restriction statement.
🧹 Nitpick comments (3)
warp/examples/fem/example_apic_fluid.py (3)
140-154: ⚡ Quick winDocument the new
capturableflag in Google style.The signature now exposes
capturable, but the docstring only describes the equations. Add anArgs:section so callers know when to use the graph-safe solve path.As per coding guidelines, "Follow Google-style docstrings" and "Use double backticks for code elements and parameter cross-references."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@warp/examples/fem/example_apic_fluid.py` around lines 140 - 154, The docstring for the `solve_incompressibility` function is missing documentation for the new `capturable` parameter. Add a Google-style `Args:` section to the docstring that documents all parameters, with particular emphasis on explaining the `capturable` flag and when callers should set it to enable the graph-safe solve path. Use double backticks around code elements and parameter names to follow the established documentation guidelines.Source: Coding guidelines
259-263: ⚡ Quick winRoute the new warning through Warp’s warning helper.
This new warning currently goes through
warnings.warn(), which writes to stderr instead of Warp’s configured warning route.♻️ Proposed fix
if use_cuda_graph and self._device.is_cuda and not self._use_cuda_graph: - warnings.warn( + from warp._src.utils import warn # noqa: PLC0415 + + warn( "CUDA graph capture for example_apic_fluid requires a CUDA memory pool and conditional graphs.", stacklevel=2, )As per coding guidelines, "
warp/**/*.py: Usewarp._src.utils.warn()instead ofwarnings.warn()—it routes warnings to stdout."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@warp/examples/fem/example_apic_fluid.py` around lines 259 - 263, The warnings.warn() call in the CUDA graph capture check is not following Warp's coding guidelines. Replace the warnings.warn() call with warp._src.utils.warn(), making sure to pass the warning message appropriately. Check the signature of warp._src.utils.warn() to ensure all parameters are compatible, as it may not accept the stacklevel parameter that warnings.warn() uses.Source: Coding guidelines
200-221: ⚡ Quick winAdd class-level docs for the new constructor controls.
Example.__init__()now exposes graph and grid-capacity knobs, butExamplehas no class docstring documenting them. Put the Google-styleArgs:documentation on the class, not on__init__().As per coding guidelines, "Document
__init__parameters in the class docstring, not the__init__method."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@warp/examples/fem/example_apic_fluid.py` around lines 200 - 221, Add a class-level docstring to the Example class that documents all the constructor parameters exposed by the __init__ method. The docstring should use Google-style formatting with an Args: section that documents the parameters: quiet, stage_path, voxel_size, opengl, use_cuda_graph, grid_capacity_ratio, grid_leaf_capacity_ratio, grid_internal_capacity_ratio, max_active_voxels, max_leaf_nodes, max_lower_nodes, and max_upper_nodes. Place this docstring at the beginning of the Example class definition to follow the coding guideline of documenting __init__ parameters in the class docstring rather than on the __init__ method itself.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@warp/_src/fem/geometry/adaptive_nanogrid.py`:
- Around line 810-825: The kernel function _fill_environment_cell_level_flat
writes to cell_level array without validating that cell_index is not -1, which
could occur if the volume lookup fails. Add a defensive guard check before the
assignment to cell_level to verify that cell_index is not equal to -1 before
performing the write operation, matching the pattern used in the parallel
_fill_cell_env_from_ijk_points_masked kernel to prevent potential out-of-bounds
writes.
In `@warp/_src/types.py`:
- Line 6709: The _supported_allocation_types tuple has been expanded to include
uint32, int64, double, and Vec3d, but the docstring for the allocate_by_tiles()
method has not been updated to reflect these new supported types. Locate the
allocate_by_tiles() method docstring and update it to document all supported
allocation types including the newly added ones (uint32, int64, double, and
Vec3d) in addition to the previously documented types (int32, float, Vec3f,
Vec4f). Ensure any other related documentation around lines 6767-6770 is also
updated consistently to reflect the complete list of supported types.
- Around line 6922-6930: The assignment of `_rebuild_bg_value = bg_value` at the
point where RebuildInfo is created (and also at the second location noted in the
comment) directly retains a reference to the caller's mutable object. Instead of
keeping this reference, extract and store the C payload/representation of
bg_value at assignment time so that mutations to the caller's original object do
not affect future rebuilds. Then update the `_rebuild_tiles()` method to
reconstruct and pass the C value from the stored payload rather than
re-normalizing the potentially mutated Python bg_value object.
- Around line 7267-7272: The validation in the _volume_rebuild_status_array
function is too permissive. The is_array() check accepts non-regular array
wrappers like indexedarray and fabric annotations, but the function immediately
accesses status.ptr which requires a regular contiguous array. Add additional
validation after the is_array() check to ensure status is a regular contiguous
Warp array by checking that it is not an indexed array, fabric annotation, or
non-contiguous layout. Raise a RuntimeError with a clear message if these
conditions are not met, preventing AttributeError or invalid pointer issues
downstream.
In `@warp/examples/fem/example_apic_fluid.py`:
- Around line 548-554: The validation check for the three ratio parameters
(active_ratio, leaf_ratio, and internal_ratio) currently only checks if they are
positive (greater than 0.0), but should reject values below 1.0 to prevent
capacity allocations smaller than the initial grid topology counts. Update the
conditional check from ratio <= 0.0 to ratio < 1.0 and modify the error message
to indicate that the ratios must be at least 1.0 or greater.
In `@warp/native/volume_builder_cpu.cpp`:
- Around line 681-691: The issue is that when capacity_status indicates overflow
(not WP_VOLUME_REBUILD_SUCCESS), the code switches layout_capacities to
exact_capacities but calculates out_grid_size using std::max of
rebuildable_grid_size for both capacities, which can allocate a grid larger than
the requested capacity and violates the fixed-capacity contract. Either return
failure/null when capacity_status is not WP_VOLUME_REBUILD_SUCCESS, or ensure
that out_grid_size is calculated using only the rebuildable_grid_size that
corresponds to the layout_capacities actually being used (the conditional value
selected for layout_capacities), not the maximum of both.
In `@warp/native/volume_builder.cu`:
- Around line 292-321: The rebuild_finalize_counts() function removes invalid
keys by decrementing counts when the last element is REBUILD_INVALID_KEY, but
when all counts become zero (all data filtered out), the status is left as
success while the CPU path treats this as invalid input. Add a check after the
four invalid key removal statements (that decrement leaf_count, lower_count,
upper_count, and voxel_count) to detect when all counts are zero and call
rebuild_set_status to set an appropriate error status, ensuring the GPU and CPU
paths handle all-masked rebuilds consistently.
- Around line 956-958: Move capacity validation to occur before device memory
allocation in the allocate_rebuildable_grid_from_tiles and
allocate_rebuildable_grid_from_active_voxels functions rather than in the
rebuild functions. Add explicit validation that capacities.voxel_count is not
zero before allowing the active voxel grid path to reach the
rebuild_set_active_voxels kernel launch at line 957, since rebuild_num_blocks
with zero voxel_count results in zero blocks being launched. Return nullptr and
set the result size to zero with WP_VOLUME_REBUILD_INVALID_INPUT error code when
any required capacity value (including voxel_count for active voxel grids) is
invalid. Apply this same validation pattern to the other affected code paths
mentioned at lines 1000-1007, 1161-1167, and 1207-1213.
---
Outside diff comments:
In `@warp/_src/types.py`:
- Around line 6608-6611: Update the docstring for the allocate() method to
reflect its current device support. The statement "This function is only
supported for CUDA devices" is outdated since allocate() now delegates to
allocate_by_tiles() which supports both CPU and CUDA. Modify the docstring to
indicate that the function supports both CPU and CUDA devices, removing or
updating the CUDA-only restriction statement.
---
Nitpick comments:
In `@warp/examples/fem/example_apic_fluid.py`:
- Around line 140-154: The docstring for the `solve_incompressibility` function
is missing documentation for the new `capturable` parameter. Add a Google-style
`Args:` section to the docstring that documents all parameters, with particular
emphasis on explaining the `capturable` flag and when callers should set it to
enable the graph-safe solve path. Use double backticks around code elements and
parameter names to follow the established documentation guidelines.
- Around line 259-263: The warnings.warn() call in the CUDA graph capture check
is not following Warp's coding guidelines. Replace the warnings.warn() call with
warp._src.utils.warn(), making sure to pass the warning message appropriately.
Check the signature of warp._src.utils.warn() to ensure all parameters are
compatible, as it may not accept the stacklevel parameter that warnings.warn()
uses.
- Around line 200-221: Add a class-level docstring to the Example class that
documents all the constructor parameters exposed by the __init__ method. The
docstring should use Google-style formatting with an Args: section that
documents the parameters: quiet, stage_path, voxel_size, opengl, use_cuda_graph,
grid_capacity_ratio, grid_leaf_capacity_ratio, grid_internal_capacity_ratio,
max_active_voxels, max_leaf_nodes, max_lower_nodes, and max_upper_nodes. Place
this docstring at the beginning of the Example class definition to follow the
coding guideline of documenting __init__ parameters in the class docstring
rather than on the __init__ method itself.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Enterprise
Run ID: 3e2eacbe-8cd7-460e-81f3-c4fc66fad1a5
⛔ Files ignored due to path filters (15)
warp/native/nanovdb/GridHandle.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/HostBuffer.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/NodeManager.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/cuda/DeviceBuffer.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/cuda/DeviceResource.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/cuda/GridHandle.cuhis excluded by!warp/native/nanovdb/**warp/native/nanovdb/cuda/NodeManager.cuhis excluded by!warp/native/nanovdb/**warp/native/nanovdb/cuda/TempPool.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/tools/GridChecksum.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/tools/cuda/GridChecksum.cuhis excluded by!warp/native/nanovdb/**warp/native/nanovdb/tools/cuda/PointsToGrid.cuhis excluded by!warp/native/nanovdb/**warp/native/nanovdb/util/ForEach.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/util/Range.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/util/cuda/Timer.his excluded by!warp/native/nanovdb/**warp/native/nanovdb/util/cuda/Util.his excluded by!warp/native/nanovdb/**
📒 Files selected for processing (24)
CHANGELOG.mdbuild_lib.pydocs/language_reference/builtins.rstwarp/__init__.pyiwarp/_src/builtins.pywarp/_src/context.pywarp/_src/fem/geometry/adaptive_nanogrid.pywarp/_src/fem/geometry/nanogrid.pywarp/_src/types.pywarp/examples/fem/example_apic_fluid.pywarp/examples/fem/example_apic_fluid_multi_env.pywarp/native/exports.hwarp/native/volume.cppwarp/native/volume.cuwarp/native/volume.hwarp/native/volume_builder.cuwarp/native/volume_builder.hwarp/native/volume_builder_cpu.cppwarp/native/volume_impl.hwarp/native/warp.hwarp/tests/fem/test_fem_examples.pywarp/tests/fem/test_fem_geometry.pywarp/tests/fem/test_fem_multi_env.pywarp/tests/geometry/test_volume_write.py
| @wp.kernel | ||
| def _fill_environment_cell_level( | ||
| def _fill_environment_cell_level_flat( | ||
| cell_grid: wp.uint64, | ||
| packed_offset: int, | ||
| point_mask: wp.array(dtype=wp.int32), | ||
| packed_ijk: wp.array(dtype=wp.vec3i), | ||
| packed_level: wp.array(dtype=wp.uint8), | ||
| cell_level: wp.array(dtype=wp.uint8), | ||
| ): | ||
| cell = wp.tid() | ||
| packed_index = packed_offset + cell | ||
| ijk = packed_ijk[packed_index] | ||
| point = wp.tid() | ||
| if point_mask: | ||
| if point_mask[point] == 0: | ||
| return | ||
|
|
||
| ijk = packed_ijk[point] | ||
| cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2]) | ||
| cell_level[cell_index] = packed_level[packed_index] | ||
| cell_level[cell_index] = packed_level[point] |
There was a problem hiding this comment.
Missing guard for invalid cell lookup.
The kernel writes to cell_level[cell_index] without checking if cell_index == -1. While this should not happen for valid unmasked points (since the volume is built from the same packed coordinates), the parallel _fill_cell_env_from_ijk_points_masked kernel in nanogrid.py defensively checks if cell_index != -1 before writing. Adding the same guard here would maintain consistency and prevent out-of-bounds writes if the invariant is ever violated.
🛡️ Proposed fix to add guard
`@wp.kernel`
def _fill_environment_cell_level_flat(
cell_grid: wp.uint64,
point_mask: wp.array(dtype=wp.int32),
packed_ijk: wp.array(dtype=wp.vec3i),
packed_level: wp.array(dtype=wp.uint8),
cell_level: wp.array(dtype=wp.uint8),
):
point = wp.tid()
if point_mask:
if point_mask[point] == 0:
return
ijk = packed_ijk[point]
cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2])
- cell_level[cell_index] = packed_level[point]
+ if cell_index != -1:
+ cell_level[cell_index] = packed_level[point]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @wp.kernel | |
| def _fill_environment_cell_level( | |
| def _fill_environment_cell_level_flat( | |
| cell_grid: wp.uint64, | |
| packed_offset: int, | |
| point_mask: wp.array(dtype=wp.int32), | |
| packed_ijk: wp.array(dtype=wp.vec3i), | |
| packed_level: wp.array(dtype=wp.uint8), | |
| cell_level: wp.array(dtype=wp.uint8), | |
| ): | |
| cell = wp.tid() | |
| packed_index = packed_offset + cell | |
| ijk = packed_ijk[packed_index] | |
| point = wp.tid() | |
| if point_mask: | |
| if point_mask[point] == 0: | |
| return | |
| ijk = packed_ijk[point] | |
| cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2]) | |
| cell_level[cell_index] = packed_level[packed_index] | |
| cell_level[cell_index] = packed_level[point] | |
| `@wp.kernel` | |
| def _fill_environment_cell_level_flat( | |
| cell_grid: wp.uint64, | |
| point_mask: wp.array(dtype=wp.int32), | |
| packed_ijk: wp.array(dtype=wp.vec3i), | |
| packed_level: wp.array(dtype=wp.uint8), | |
| cell_level: wp.array(dtype=wp.uint8), | |
| ): | |
| point = wp.tid() | |
| if point_mask: | |
| if point_mask[point] == 0: | |
| return | |
| ijk = packed_ijk[point] | |
| cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2]) | |
| if cell_index != -1: | |
| cell_level[cell_index] = packed_level[point] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/_src/fem/geometry/adaptive_nanogrid.py` around lines 810 - 825, The
kernel function _fill_environment_cell_level_flat writes to cell_level array
without validating that cell_index is not -1, which could occur if the volume
lookup fails. Add a defensive guard check before the assignment to cell_level to
verify that cell_index is not equal to -1 before performing the write operation,
matching the pattern used in the parallel _fill_cell_env_from_ijk_points_masked
kernel to prevent potential out-of-bounds writes.
| # nanovdb types for which we instantiate the grid builder | ||
| # Should be in sync with WP_VOLUME_BUILDER_INSTANTIATE_TYPES in volume_builder.h | ||
| _supported_allocation_types = ("int32", "float", "Vec3f", "Vec4f") | ||
| _supported_allocation_types = ("int32", "uint32", "int64", "float", "double", "Vec3f", "Vec3d", "Vec4f") |
There was a problem hiding this comment.
Keep the supported grid type docs in sync with the expanded allocation list.
_supported_allocation_types now includes uint32, int64, double, and Vec3d, but the public allocate_by_tiles() docstring still advertises only int, float, vec3f, and vec4f.
Also applies to: 6767-6770
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/_src/types.py` at line 6709, The _supported_allocation_types tuple has
been expanded to include uint32, int64, double, and Vec3d, but the docstring for
the allocate_by_tiles() method has not been updated to reflect these new
supported types. Locate the allocate_by_tiles() method docstring and update it
to document all supported allocation types including the newly added ones
(uint32, int64, double, and Vec3d) in addition to the previously documented
types (int32, float, Vec3f, Vec4f). Ensure any other related documentation
around lines 6767-6770 is also updated consistently to reflect the complete list
of supported types.
| if rebuildable: | ||
| volume._rebuild_info = Volume.RebuildInfo( | ||
| "tiles", | ||
| max_tiles_c * Volume._NANOVDB_LEAF_TABLE_COUNT, | ||
| max_tiles_c, | ||
| max_lower_nodes_c, | ||
| max_upper_nodes_c, | ||
| ) | ||
| volume._rebuild_bg_value = bg_value |
There was a problem hiding this comment.
Copy the rebuild background payload instead of retaining the caller object.
For vector/list background values, _rebuild_bg_value = bg_value keeps a mutable caller-owned object. If that object is later mutated, _rebuild_tiles() recomputes the C payload from the changed value, so rebuilds no longer preserve the original background value/type promised by the API.
Suggested direction
- volume._rebuild_bg_value = bg_value
+ # Store an immutable/owned normalized payload for future rebuilds.
+ volume._rebuild_bg_value = _volume_rebuild_bg_value_copy(bg_value)Then have _rebuild_tiles() reconstruct/pass the C value from that owned payload rather than re-normalizing the original Python object.
Also applies to: 7162-7164
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/_src/types.py` around lines 6922 - 6930, The assignment of
`_rebuild_bg_value = bg_value` at the point where RebuildInfo is created (and
also at the second location noted in the comment) directly retains a reference
to the caller's mutable object. Instead of keeping this reference, extract and
store the C payload/representation of bg_value at assignment time so that
mutations to the caller's original object do not affect future rebuilds. Then
update the `_rebuild_tiles()` method to reconstruct and pass the C value from
the stored payload rather than re-normalizing the potentially mutated Python
bg_value object.
| def _volume_rebuild_status_array(status: array, device) -> array: | ||
| if not is_array(status) or status.dtype != uint32 or status.size < 1: | ||
| raise RuntimeError("status must be a Warp array with dtype uint32 and at least one element") | ||
| if status.device != device: | ||
| raise RuntimeError(f"status must be on device {device}") | ||
| return status |
There was a problem hiding this comment.
Require status to be a regular contiguous wp.array.
is_array() accepts non-regular array wrappers, but callers immediately use status.ptr. Tighten this helper to reject indexedarray/fabric annotations/non-contiguous inputs with the documented RuntimeError instead of leaking an AttributeError or passing an invalid native pointer.
Proposed validation tightening
def _volume_rebuild_status_array(status: array, device) -> array:
- if not is_array(status) or status.dtype != uint32 or status.size < 1:
- raise RuntimeError("status must be a Warp array with dtype uint32 and at least one element")
+ if (
+ type(status) is not array
+ or status.dtype != uint32
+ or status.ndim != 1
+ or status.shape[0] < 1
+ or not status.is_contiguous
+ ):
+ raise RuntimeError("status must be a contiguous 1D Warp array with dtype uint32 and at least one element")
if status.device != device:
raise RuntimeError(f"status must be on device {device}")
return status🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/_src/types.py` around lines 7267 - 7272, The validation in the
_volume_rebuild_status_array function is too permissive. The is_array() check
accepts non-regular array wrappers like indexedarray and fabric annotations, but
the function immediately accesses status.ptr which requires a regular contiguous
array. Add additional validation after the is_array() check to ensure status is
a regular contiguous Warp array by checking that it is not an indexed array,
fabric annotation, or non-contiguous layout. Raise a RuntimeError with a clear
message if these conditions are not met, preventing AttributeError or invalid
pointer issues downstream.
| for name, ratio in ( | ||
| ("grid_capacity_ratio", active_ratio), | ||
| ("grid_leaf_capacity_ratio", leaf_ratio), | ||
| ("grid_internal_capacity_ratio", internal_ratio), | ||
| ): | ||
| if ratio <= 0.0: | ||
| raise ValueError(f"{name} must be positive") |
There was a problem hiding this comment.
Reject capacity ratios below the initial grid size.
These ratios are used to size the rebuildable volume allocated from the initial particle grid. Values between 0.0 and 1.0 pass validation but can produce capacities smaller than the exact initial topology counts, causing immediate allocation/rebuild overflow.
🐛 Proposed fix
for name, ratio in (
("grid_capacity_ratio", active_ratio),
("grid_leaf_capacity_ratio", leaf_ratio),
("grid_internal_capacity_ratio", internal_ratio),
):
- if ratio <= 0.0:
- raise ValueError(f"{name} must be positive")
+ if ratio < 1.0:
+ raise ValueError(f"{name} must be at least 1.0")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/examples/fem/example_apic_fluid.py` around lines 548 - 554, The
validation check for the three ratio parameters (active_ratio, leaf_ratio, and
internal_ratio) currently only checks if they are positive (greater than 0.0),
but should reject values below 1.0 to prevent capacity allocations smaller than
the initial grid topology counts. Update the conditional check from ratio <= 0.0
to ratio < 1.0 and modify the error message to indicate that the ratios must be
at least 1.0 or greater.
| const uint32_t capacity_status = rebuild_capacity_status(scratch, capacities); | ||
| const VolumeRebuildCapacities exact_capacities = rebuild_exact_capacities(scratch); | ||
| const VolumeRebuildCapacities& layout_capacities | ||
| = capacity_status == WP_VOLUME_REBUILD_SUCCESS ? capacities : exact_capacities; | ||
|
|
||
| out_grid_size | ||
| = std::max(rebuildable_grid_size<BuildT>(capacities), rebuildable_grid_size<BuildT>(exact_capacities)); | ||
| out_grid = static_cast<nanovdb::Grid<nanovdb::NanoTree<BuildT>>*>( | ||
| wp_alloc_host(out_grid_size, "(native:volume_builder_cpu)") | ||
| ); | ||
| rebuild_populate_grid(out_grid, out_grid_size, scratch, layout_capacities, params); |
There was a problem hiding this comment.
Don't grow rebuildable CPU grids past requested capacities.
On capacity overflow, this switches to exact_capacities and allocates max(requested, exact), so CPU allocation can return a grid layout larger than the caller’s declared rebuild capacity while reporting capacity exceeded. That breaks the fixed-capacity contract and can diverge from CUDA/future in-place rebuild metadata. Return failure/null or populate only within the requested layout instead of silently changing the layout.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/native/volume_builder_cpu.cpp` around lines 681 - 691, The issue is that
when capacity_status indicates overflow (not WP_VOLUME_REBUILD_SUCCESS), the
code switches layout_capacities to exact_capacities but calculates out_grid_size
using std::max of rebuildable_grid_size for both capacities, which can allocate
a grid larger than the requested capacity and violates the fixed-capacity
contract. Either return failure/null when capacity_status is not
WP_VOLUME_REBUILD_SUCCESS, or ensure that out_grid_size is calculated using only
the rebuildable_grid_size that corresponds to the layout_capacities actually
being used (the conditional value selected for layout_capacities), not the
maximum of both.
| if (leaf_count > 0 && leaf_keys && leaf_keys[leaf_count - 1] == REBUILD_INVALID_KEY) | ||
| --leaf_count; | ||
| if (lower_count > 0 && lower_keys && lower_keys[lower_count - 1] == REBUILD_INVALID_KEY) | ||
| --lower_count; | ||
| if (upper_count > 0 && upper_keys && upper_keys[upper_count - 1] == REBUILD_INVALID_KEY) | ||
| --upper_count; | ||
| if (voxel_count > 0 && voxel_keys && voxel_keys[voxel_count - 1] == REBUILD_INVALID_KEY) | ||
| --voxel_count; | ||
|
|
||
| if (enforce_capacities && leaf_count > capacities.leaf_count) { | ||
| rebuild_set_status(status, WP_VOLUME_REBUILD_LEAF_CAPACITY_EXCEEDED); | ||
| leaf_count = capacities.leaf_count; | ||
| } | ||
| if (enforce_capacities && lower_count > capacities.lower_count) { | ||
| rebuild_set_status(status, WP_VOLUME_REBUILD_LOWER_CAPACITY_EXCEEDED); | ||
| lower_count = capacities.lower_count; | ||
| } | ||
| if (enforce_capacities && upper_count > capacities.upper_count) { | ||
| rebuild_set_status(status, WP_VOLUME_REBUILD_UPPER_CAPACITY_EXCEEDED); | ||
| upper_count = capacities.upper_count; | ||
| } | ||
| if (enforce_capacities && uint64_t(voxel_count) > capacities.voxel_count) { | ||
| rebuild_set_status(status, WP_VOLUME_REBUILD_VOXEL_CAPACITY_EXCEEDED); | ||
| voxel_count = uint32_t(capacities.voxel_count); | ||
| } | ||
|
|
||
| __shared__ BBoxProxy bbox_mem; | ||
| counts[REBUILD_COUNT_LEAF] = leaf_count; | ||
| counts[REBUILD_COUNT_LOWER] = lower_count; | ||
| counts[REBUILD_COUNT_UPPER] = upper_count; | ||
| counts[REBUILD_COUNT_VOXEL] = active_voxel_grid ? voxel_count : leaf_count * PNANOVDB_LEAF_TABLE_COUNT; |
There was a problem hiding this comment.
Report all-masked rebuilds consistently.
When point_mask filters out every point, the sorted runs contain only REBUILD_INVALID_KEY; rebuild_finalize_counts() decrements all counts to zero but leaves status as success, and the rebuild then populates an empty/invalid-bbox grid. The CPU path treats the same no-valid-point case as invalid input, so masked rebuilds diverge by device. Set invalid status here, or explicitly define empty masked rebuilds as success on both paths.
Also applies to: 1010-1017
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/native/volume_builder.cu` around lines 292 - 321, The
rebuild_finalize_counts() function removes invalid keys by decrementing counts
when the last element is REBUILD_INVALID_KEY, but when all counts become zero
(all data filtered out), the status is left as success while the CPU path treats
this as invalid input. Add a check after the four invalid key removal statements
(that decrement leaf_count, lower_count, upper_count, and voxel_count) to detect
when all counts are zero and call rebuild_set_status to set an appropriate error
status, ensuring the GPU and CPU paths handle all-masked rebuilds consistently.
| if (scratch.active_voxel_grid && data.grid_type == PNANOVDB_GRID_TYPE_ONINDEX) { | ||
| rebuild_set_active_voxels<<<rebuild_num_blocks(capacities.voxel_count), REBUILD_NUM_THREADS, 0, stream>>>(data); | ||
| rebuild_exclusive_sum_u32_to_u64(leaf_active_counts, leaf_active_prefix, int(capacities.leaf_count), stream); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the file structure and locate the key functions
wc -l warp/native/volume_builder.cuRepository: NVIDIA/warp
Length of output: 90
🏁 Script executed:
# Examine the lines mentioned in the review comment
sed -n '950,970p' warp/native/volume_builder.cuRepository: NVIDIA/warp
Length of output: 1354
🏁 Script executed:
# Check the allocation wrappers mentioned
grep -n "allocate_rebuildable_grid_from_active_voxels\|allocate_rebuildable_grid_from_tiles" warp/native/volume_builder.cu | head -20Repository: NVIDIA/warp
Length of output: 342
🏁 Script executed:
# Find rebuild_grid_from_points_impl function
grep -n "rebuild_grid_from_points_impl" warp/native/volume_builder.cuRepository: NVIDIA/warp
Length of output: 271
🏁 Script executed:
# Check rebuild_num_blocks implementation
grep -n "rebuild_num_blocks" warp/native/volume_builder.cu | head -10Repository: NVIDIA/warp
Length of output: 1169
🏁 Script executed:
# Check rebuild_grid_from_points_impl implementation for capacity validation
sed -n '987,1050p' warp/native/volume_builder.cuRepository: NVIDIA/warp
Length of output: 2256
🏁 Script executed:
# Check allocate_rebuildable_grid_from_active_voxels
sed -n '1195,1215p' warp/native/volume_builder.cuRepository: NVIDIA/warp
Length of output: 864
🏁 Script executed:
# Check allocate_rebuildable_grid_from_tiles
sed -n '1149,1192p' warp/native/volume_builder.cuRepository: NVIDIA/warp
Length of output: 1591
🏁 Script executed:
# Check the mentioned line ranges: 1000-1007, 1161-1167, 1207-1213
echo "=== Lines 1000-1007 ==="
sed -n '1000,1007p' warp/native/volume_builder.cu
echo ""
echo "=== Lines 1161-1167 ==="
sed -n '1161,1167p' warp/native/volume_builder.cu
echo ""
echo "=== Lines 1207-1213 ==="
sed -n '1207,1213p' warp/native/volume_builder.cuRepository: NVIDIA/warp
Length of output: 1295
Validate capacities before allocating or launching CUDA kernels.
Both allocation wrappers (allocate_rebuildable_grid_from_tiles and allocate_rebuildable_grid_from_active_voxels) allocate device memory before calling rebuild functions that validate capacities. For active voxel grids, rebuild_grid_from_points_impl() validates leaf_count, lower_count, and upper_count but does not check capacities.voxel_count. This allows zero voxel capacity to reach line 957, where rebuild_set_active_voxels<<<rebuild_num_blocks(capacities.voxel_count)... launches with zero blocks. Move validation to occur before memory allocation and prevent voxel_count == 0 from reaching kernel launches on the active voxel path. Return nullptr/zero size with WP_VOLUME_REBUILD_INVALID_INPUT for invalid capacities.
Also applies to: 1000-1007, 1161-1167, 1207-1213
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@warp/native/volume_builder.cu` around lines 956 - 958, Move capacity
validation to occur before device memory allocation in the
allocate_rebuildable_grid_from_tiles and
allocate_rebuildable_grid_from_active_voxels functions rather than in the
rebuild functions. Add explicit validation that capacities.voxel_count is not
zero before allowing the active voxel grid path to reach the
rebuild_set_active_voxels kernel launch at line 957, since rebuild_num_blocks
with zero voxel_count results in zero blocks being launched. Return nullptr and
set the result size to zero with WP_VOLUME_REBUILD_INVALID_INPUT error code when
any required capacity value (including voxel_count for active voxel grids) is
invalid. Apply this same validation pattern to the other affected code paths
mentioned at lines 1000-1007, 1161-1167, and 1207-1213.
|
/ok to test 37f9ce4 |
Description
Add support for rebuildable NanoVDB volumes whose topology and values can be
recomputed in place, including inside CUDA graph capture. Previously, rebuilding
a
wp.Volumefrom a new set of points required allocating a fresh grid eachtime, which is incompatible with graph capture and wasteful when the grid
capacity is known. This PR introduces capacity/status controls, point masks for
ignoring invalid points during a rebuild, and a CPU build path so the same API
works across devices.
warp.fem.Nanogridgains topology-refresh support forrebuildable volumes, and
example_apic_fluidis updated to exercise agraph-captured grid rebuild.
This is being opened in draft to validate that CI workflows pass.
Changes
wp.Volumeto be rebuilt in place from new points.point_maskarray for volume builds so graph-captured rebuilds canignore invalid points.
warp.fem.Nanogridtopology-refresh support for rebuildable volumes.instantiation across value types.
example_apic_fluidto exercise captured grid rebuilds.Checklist
Unreleasedsection.Validation summary
Draft PR to confirm CI passes. Locally, the change is covered by expanded tests
in
warp/tests/geometry/test_volume_write.py(rebuild, capacity/status, pointmask, CPU/CUDA parity),
warp/tests/fem/test_fem_geometry.pyandwarp/tests/fem/test_fem_multi_env.py(Nanogrid topology refresh), andwarp/tests/fem/test_fem_examples.py(graph-captured APIC fluid example).New feature / enhancement
Summary by CodeRabbit
Release Notes
New Features
volume_voxel_count()API to query active voxel counts in volumesImprovements