Skip to content

Add graph-capturable rebuildable NanoVDB volumes#1567

Draft
shi-eric wants to merge 19 commits into
NVIDIA:mainfrom
shi-eric:gdaviet/nanovdb-graph-capture
Draft

Add graph-capturable rebuildable NanoVDB volumes#1567
shi-eric wants to merge 19 commits into
NVIDIA:mainfrom
shi-eric:gdaviet/nanovdb-graph-capture

Conversation

@shi-eric

@shi-eric shi-eric commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

Add support for rebuildable NanoVDB volumes whose topology and values can be
recomputed in place, including inside CUDA graph capture. Previously, rebuilding
a wp.Volume from a new set of points required allocating a fresh grid each
time, which is incompatible with graph capture and wasteful when the grid
capacity is known. This PR introduces capacity/status controls, point masks for
ignoring invalid points during a rebuild, and a CPU build path so the same API
works across devices. warp.fem.Nanogrid gains topology-refresh support for
rebuildable volumes, and example_apic_fluid is updated to exercise a
graph-captured grid rebuild.

This is being opened in draft to validate that CI workflows pass.

Changes

  • Add rebuildable NanoVDB volumes with capacity and status controls, allowing a
    wp.Volume to be rebuilt in place from new points.
  • Add a point_mask array for volume builds so graph-captured rebuilds can
    ignore invalid points.
  • Add a CPU NanoVDB build path for parity with the CUDA builder.
  • Make grid rebuilds CUDA graph-capturable.
  • Add warp.fem.Nanogrid topology-refresh support for rebuildable volumes.
  • Replace the previous points-to-grid builder and share topology-kernel
    instantiation across value types.
  • Remove no-longer-used build-time NanoVDB utility headers.
  • Update example_apic_fluid to exercise captured grid rebuilds.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • CHANGELOG.md is updated for any user-facing changes under the Unreleased section.

Validation summary

Draft PR to confirm CI passes. Locally, the change is covered by expanded tests
in warp/tests/geometry/test_volume_write.py (rebuild, capacity/status, point
mask, CPU/CUDA parity), warp/tests/fem/test_fem_geometry.py and
warp/tests/fem/test_fem_multi_env.py (Nanogrid topology refresh), and
warp/tests/fem/test_fem_examples.py (graph-captured APIC fluid example).

New feature / enhancement

import warp as wp

# Allocate a rebuildable volume with reserved capacity, then rebuild it in place
# from new points inside a CUDA graph capture.
volume = wp.Volume.allocate_by_voxels(
    voxel_points=points,
    voxel_size=0.1,
    rebuildable=True,
)

with wp.ScopedCapture() as capture:
    volume.rebuild(points=new_points)
wp.capture_launch(capture.graph)

Summary by CodeRabbit

Release Notes

  • New Features

    • Rebuildable NanoVDB volumes with capacity and status controls for efficient iterative grid updates
    • New volume_voxel_count() API to query active voxel counts in volumes
    • Point mask filtering for volume allocation and rebuild operations
    • CUDA graph-capturable grid rebuild support for optimized simulation workflows
    • Grid topology refresh capability for adaptive multi-environment grids
  • Improvements

    • Enhanced APIC fluid simulation example with CUDA graph capture and grid rebuild support
    • Extended Volume and Nanogrid APIs for persistent rebuild operations

gdaviet added 19 commits June 16, 2026 13:49
Keep volume rebuilds capturable by moving rebuild counts and capacity checks onto CUDA while preserving legacy NanoVDB ordering for valued and index grids.

Add parity coverage for valued and index setup paths, plus a benchmark for legacy allocation, rebuildable setup, direct rebuild, and graph replay.

Signed-off-by: Gilles Daviet <gdaviet@nvidia.com>
Use padded BSR for captured APIC

Use padded BSR assembly for the captured APIC velocity projector so capacity-sized rebuild grids do not compact over oversized topology.

Signed-off-by: Gilles Daviet <gdaviet@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Wait, I need to make sure every range appears exactly once. Let me fix range_30d8591fed22 — I placed it in both layer_voxel_count_builtin and layer_tests_docs. Let me reassign it: it belongs only in layer_voxel_count_builtin. Also range_8b9e3d238904 appeared in both layer_python_volume and layer_voxel_count_builtin — it's about allocate_by_tiles() docstring, belongs in layer_python_volume only.

Hmm, I have range_30d8591fed22 in both layer_python_volume and layer_tests. Let me do a final clean pass ensuring every range appears exactly once.

Walkthrough

Adds rebuildable NanoVDB volumes (tiles and active voxels) with capacity/status controls, point masks, CPU and CUDA implementations, and CUDA-graph-capturable rebuilds. Extends Volume, Nanogrid, and AdaptiveNanogrid Python APIs. Migrates multi-environment APIs to flat points/point_envs inputs. Adds volume_voxel_count builtin and updates the APIC fluid example with CUDA graph capture.

Changes

Rebuildable NanoVDB Volume Feature

Layer / File(s) Summary
Native volume builder: rebuild types, CUDA pipeline, CPU implementation
build_lib.py, warp/native/volume_builder.h, warp/native/volume_builder.cu, warp/native/volume_builder_cpu.cpp
Replaces build_grid_from_points with a new allocation/rebuild pipeline. volume_builder.h adds VolumeRebuildCapacities, VolumeRebuildStatus, RebuildGridDataBase, internal CUDA-callable key/hierarchy helpers, expanded type instantiations, and new allocation/rebuild function declarations. volume_builder.cu implements GPU key emission, radix sort, RLE, NanoVDB node initialization, bbox propagation, and all allocation/rebuild entry points. volume_builder_cpu.cpp (new file, wired into the build) mirrors the full pipeline on CPU.
Native C API: warp.h, volume.h/.cpp/.cu, volume_impl.h, exports.h
warp/native/warp.h, warp/native/volume.h, warp/native/volume.cpp, warp/native/volume.cu, warp/native/volume_impl.h, warp/native/exports.h
Adds host/device build and rebuild C entrypoints in warp.h. Extends VolumeDesc with rebuild fields, adds wp_volume_get_active_stats and clamped tile/voxel coordinate output in volume.cpp. Introduces volume_voxel_count CUDA-callable function and MAX_INDEXABLE_VOXEL_COUNT in volume.h, effective_voxel_count in volume_impl.h, the C export wrapper in exports.h, and actual-count bounds checks in CUDA leaf/voxel coordinate kernels in volume.cu.
Python Volume rebuild API, ctypes bindings, and volume_voxel_count builtin
warp/_src/types.py, warp/_src/context.py, warp/_src/builtins.py, warp/__init__.pyi, docs/language_reference/builtins.rst
Extends Volume with RebuildInfo/ActiveStats NamedTuples, rebuild status constants, is_rebuildable, get_rebuild_info(), get_active_stats(), and updated allocate_by_tiles()/allocate_by_voxels()/rebuild() signatures. Adds ctypes bindings for all new C functions. Registers volume_voxel_count as a non-differentiable kernel builtin with type stub and docs entry.
Nanogrid FEM: rebuildable init, rebuild(), topology refresh, environment helpers
warp/_src/fem/geometry/nanogrid.py
Extends Nanogrid.__init__ with rebuildable mode (capacity-sized cell_ijk, rebuildable node grid). Adds Nanogrid.rebuild(), rebuild_topology_from_cells(), and _refresh_rebuildable_topology. Adds environment normalization/packing/bounds kernels for vec3i/vec3f inputs with optional masking, deprecated legacy-sequence path with log_warning, cell_env population kernels, and rebuildable node-candidate helpers.
AdaptiveNanogrid flat-input API refactor
warp/_src/fem/geometry/adaptive_nanogrid.py
Replaces per-environment cell_ijks: Sequence[wp.array] with flat points, cell_levels, point_envs, env_count, and optional point_mask. Rewrites _make_environment_adaptive_cell_grid with dtype-dependent bounds accumulation, flat env offset computation, voxel packing, and flat-mapping cell-level/cell-env fill kernels.
APIC fluid example: CUDA graph capture and multi-env simplification
warp/examples/fem/example_apic_fluid.py, warp/examples/fem/example_apic_fluid_multi_env.py
Adds CUDA graph capture to example_apic_fluid.py: capturable flag for solve_incompressibility (using cr solver), grid capacity controls in Example.__init__, wp.ScopedCapture during initialization, grid rebuild and in-place particle advection in simulate(capturable), capacity estimation helper, and CLI extensions. Removes particle_voxels kernel from the multi-env example and calls from_environment_voxels with flat particle arrays.
Tests and CHANGELOG
CHANGELOG.md, warp/tests/fem/test_fem_examples.py, warp/tests/fem/test_fem_geometry.py, warp/tests/fem/test_fem_multi_env.py, warp/tests/geometry/test_volume_write.py
Adds test_nanogrid_rebuild and test_nanogrid_rebuild_capture tests. Extends multi-env tests with packing helpers, point_mask, world-space nanogrid, and rebuildable nanogrid cases. Substantially expands test_volume_write.py with CPU/CUDA tests for additional dtypes, point mask, voxel count, rebuildable setup matching, rebuild capture, and capacity overflow status. Updates test_fem_examples.py to use mempool devices. Adds CHANGELOG entries.

Sequence Diagram(s)

sequenceDiagram
  participant example_apic_fluid
  participant Nanogrid_rebuild
  participant Volume_rebuild
  participant wp_volume_rebuild_from_tiles_device
  participant solve_incompressibility_cr
  participant wp_capture_launch

  rect rgba(100, 150, 255, 0.5)
    note over example_apic_fluid: Initialization (once)
    example_apic_fluid->>Nanogrid_rebuild: allocate rebuildable Volume + Nanogrid
    example_apic_fluid->>wp_capture_launch: wp.ScopedCapture → capture simulate(capturable=True)
  end

  rect rgba(100, 220, 150, 0.5)
    note over example_apic_fluid: Per-step (graph replay)
    example_apic_fluid->>wp_capture_launch: capture_launch(graph)
    wp_capture_launch->>Nanogrid_rebuild: rebuild(particle_q, point_mask=pic_mask)
    Nanogrid_rebuild->>Volume_rebuild: Volume.rebuild(points, point_mask, status)
    Volume_rebuild->>wp_volume_rebuild_from_tiles_device: native CUDA rebuild
    wp_volume_rebuild_from_tiles_device-->>Volume_rebuild: updated NanoVDB grid
    Volume_rebuild-->>Nanogrid_rebuild: status
    Nanogrid_rebuild->>Nanogrid_rebuild: rebuild_topology_from_cells()
    Nanogrid_rebuild-->>example_apic_fluid: updated FEM grid
    example_apic_fluid->>solve_incompressibility_cr: divergence_mat, projector, capturable=True
    solve_incompressibility_cr-->>example_apic_fluid: pressure/velocity solution
    example_apic_fluid->>example_apic_fluid: advect particles in-place
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding rebuildable NanoVDB volumes that are compatible with CUDA graph capture, which is the central feature of this comprehensive PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Greptile Summary

This PR introduces rebuildable NanoVDB volumes that can have their topology recomputed in place — including inside CUDA graph captures — by pre-allocating a fixed-capacity buffer at construction time. It also adds a CPU NanoVDB build path, a point_mask argument to skip invalid points during builds, and topology-refresh support in warp.fem.Nanogrid, replacing the previous PointsToGrid dependency with a custom GPU/CPU rebuild pipeline.

  • Core volume changes: wp.Volume.allocate_by_tiles and allocate_by_voxels gain rebuildable, max_* capacity, status, and point_mask arguments; rebuild(), get_active_stats(), and get_rebuild_info() are added to the Python Volume class; wp_volume_get_tile_and_voxel_count now returns the reserved capacity (not the live count) for rebuildable volumes, which intentionally makes get_voxel_count() and get_tiles()/get_voxels() return capacity-sized data for use as upper-bound buffers during graph capture.
  • FEM Nanogrid: Nanogrid.rebuild() and rebuild_topology_from_cells() allow in-place topology refresh; the flat-point environment-voxel API replaces the legacy per-environment array sequence, with a deprecation warning for the old form.
  • Native layer: ~1 450 lines of legacy nanovdb/tools/cuda/PointsToGrid.cuh and associated headers are removed; a new volume_builder_cpu.cpp and rewritten volume_builder.cu provide CPU/CUDA build and in-place rebuild, sharing the same RebuildGridDataBase template helpers in volume_builder.h.

Confidence Score: 3/5

Safe to merge for new rebuildable-volume code paths, but the behavioral change to get_voxel_count() / get_tiles() / get_voxels() on rebuildable volumes (capacity returned instead of active count, tail entries uninitialized) could silently break existing downstream code that passes a rebuildable volume to those methods without expecting over-sized output.

The volume builder rewrite is large (~6 000 net lines changed) and replaces a third-party PointsToGrid implementation with a bespoke GPU/CPU pipeline. The core logic is carefully designed and well-tested. The main concern is that get_voxel_count() and its dependents now return the pre-allocated capacity — not the live active count — for rebuildable volumes, but the existing method docstrings were not updated to signal this change. Any caller that creates a rebuildable volume and feeds the get_voxels() result to another algorithm will silently receive an over-sized array with uninitialized data in the tail.

warp/_src/types.py — get_voxel_count(), get_tile_count(), get_tiles(), and get_voxels() all need docstring updates for the rebuildable-capacity semantics. warp/native/volume_builder.cu and volume_builder_cpu.cpp are large new files that could benefit from a second native-layer review pass.

Important Files Changed

Filename Overview
warp/_src/types.py Adds RebuildInfo, ActiveStats, rebuild(), get_active_stats(), get_rebuild_info(), and is_rebuildable to the Volume class; get_voxel_count()/get_tile_count()/get_tiles()/get_voxels() now silently return/use capacity for rebuildable volumes without docstring updates.
warp/native/volume.cpp Adds CPU build/rebuild paths (wp_volume_from_tiles_host, wp_volume_rebuild_from_tiles_host, etc.), wp_volume_get_active_stats, and the rebuildable/capacities fields in VolumeDesc; wp_volume_get_tile_and_voxel_count now returns capacity instead of live counts for rebuildable volumes.
warp/native/volume_builder.h Introduces VolumeRebuildCapacities, VolumeRebuildStatus, and all the allocate_rebuildable_grid_* / rebuild_grid_* template declarations; the RebuildGridDataBase helper block shared between CUDA and CPU implementations looks well-structured.
warp/native/volume_builder.cu Replaces the NanoVDB PointsToGrid dependency with a new custom GPU rebuild pipeline (key generation, radix sort, topology pass, value/bbox finalization); the capacity-guard pattern using atomicOr on a device-side status word makes the kernel graph-capturable.
warp/native/volume_builder_cpu.cpp New file implementing the CPU counterpart of the NanoVDB grid builders; mirrors the CUDA logic using a sequential single-pass sort/build, with the same capacity-overflow status flags.
warp/_src/fem/geometry/nanogrid.py Adds rebuildable support to Nanogrid: rebuild(), rebuild_topology_from_cells(), _refresh_rebuildable_topology(), and a flat-array environment-voxel API replacing the legacy per-env sequence; double-masking in the multi-env rebuild path is correct but undocumented.
warp/_src/context.py Adds ctypes argtypes/restypes for all new host and device volume builder functions; parameter counts and types match the C++ signatures.
warp/native/warp.h Exports the new host and device volume builder/rebuilder entry points and wp_volume_get_active_stats; declarations match the implementations.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant User as Python caller
    participant Vol as wp.Volume
    participant Desc as VolumeDesc (C++)
    participant Builder as volume_builder.cu/.cpp
    participant NVDb as NanoVDB buffer (device)

    User->>Vol: "allocate_by_voxels(points, rebuildable=True, max_active_voxels=N)"
    Vol->>Builder: allocate_rebuildable_grid_from_active_voxels(N, status)
    Builder->>NVDb: allocate fixed-capacity buffer
    Builder->>NVDb: initial build (sort keys, write topology)
    Builder-->>Vol: grid ptr + size
    Vol->>Desc: volume_mark_rebuildable(id, capacities)
    Desc-->>Vol: "VolumeDesc.rebuildable=true, capacities stored"

    Note over User,NVDb: Later - may be inside CUDA graph capture

    User->>Vol: "volume.rebuild(new_points, status=status_arr)"
    Vol->>Builder: rebuild_grid_from_active_voxels_host/device(id, new_points)
    Builder->>NVDb: zero existing topology in-place
    Builder->>NVDb: scatter new keys into reserved capacity
    Builder->>NVDb: update tree/root/bbox metadata
    Builder-->>Vol: status flags written to device array
    Vol-->>User: status array

    Note over User,NVDb: Query actual (live) counts

    User->>Vol: volume.get_active_stats()
    Vol->>NVDb: D2H copy of grid+tree header
    NVDb-->>Vol: live node_count_leaf / voxel_count
    Vol-->>User: ActiveStats(voxel_count, leaf_node_count, ...)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant User as Python caller
    participant Vol as wp.Volume
    participant Desc as VolumeDesc (C++)
    participant Builder as volume_builder.cu/.cpp
    participant NVDb as NanoVDB buffer (device)

    User->>Vol: "allocate_by_voxels(points, rebuildable=True, max_active_voxels=N)"
    Vol->>Builder: allocate_rebuildable_grid_from_active_voxels(N, status)
    Builder->>NVDb: allocate fixed-capacity buffer
    Builder->>NVDb: initial build (sort keys, write topology)
    Builder-->>Vol: grid ptr + size
    Vol->>Desc: volume_mark_rebuildable(id, capacities)
    Desc-->>Vol: "VolumeDesc.rebuildable=true, capacities stored"

    Note over User,NVDb: Later - may be inside CUDA graph capture

    User->>Vol: "volume.rebuild(new_points, status=status_arr)"
    Vol->>Builder: rebuild_grid_from_active_voxels_host/device(id, new_points)
    Builder->>NVDb: zero existing topology in-place
    Builder->>NVDb: scatter new keys into reserved capacity
    Builder->>NVDb: update tree/root/bbox metadata
    Builder-->>Vol: status flags written to device array
    Vol-->>User: status array

    Note over User,NVDb: Query actual (live) counts

    User->>Vol: volume.get_active_stats()
    Vol->>NVDb: D2H copy of grid+tree header
    NVDb-->>Vol: live node_count_leaf / voxel_count
    Vol-->>User: ActiveStats(voxel_count, leaf_node_count, ...)
Loading

Comments Outside Diff (1)

  1. warp/_src/types.py, line 5970-5982 (link)

    P1 get_voxel_count() silently returns capacity for rebuildable volumes

    wp_volume_get_tile_and_voxel_count now returns volume->capacities.leaf_count / voxel_count (the pre-allocated capacity) instead of the live grid counts when volume->rebuildable is true. Consequently, get_voxel_count(), get_tile_count(), get_tiles(), and get_voxels() all use this capacity value on rebuildable volumes — the docstrings were not updated to reflect this change.

    A caller who creates a rebuildable volume with max_tiles=1000 and calls volume.get_voxels() receives an array of 1000 * 512 rows, but only the first actual_voxel_count entries are written; the rest are left uninitialized (warp.empty). Any downstream code that iterates the full returned array (e.g., to populate a simulation grid) will process garbage coordinates for the tail. The new get_active_stats().voxel_count and get_active_stats().leaf_node_count are the correct queries for actual counts, but neither the modified methods nor their docstrings mention this requirement.

Reviews (1): Last reviewed commit: "Add deprecation entry for nanogrid env v..." | Re-trigger Greptile

Comment on lines +762 to +782
None, self._cell_grid_info.translation, self._cell_grid_info.transform_matrix
)
packed_points = _pack_environment_points(
points,
point_envs,
point_mask,
self._env_offsets,
inverse_transform,
translation_vec,
temporary_store=None,
device=self._cell_grid.device,
)
rebuild_points = packed_points

status = self._cell_grid.rebuild(rebuild_points, status=status, point_mask=point_mask)
if point_envs is not None:
_fill_cell_env_from_points(self._cell_grid, rebuild_points, point_envs, point_mask, self._cell_env)
if packed_points is not None:
packed_points.release()

self.rebuild_topology_from_cells()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 point_mask applied twice in multi-environment rebuild path

When point_envs is not None, _pack_environment_points is called with point_mask, which writes wp.vec3i(0) into packed_points at every masked position. The resulting packed_points is then passed to self._cell_grid.rebuild(..., point_mask=point_mask), so the same mask is checked a second time in the C++ rebuild — masked entries are skipped, and the placeholder (0,0,0) points are never inserted into the grid. This is correct behavior, but it makes the code harder to reason about: a reader must track that packed_points already has sentinel zeros for masked entries and that the C++ builder additionally ignores them. A comment explaining this intent would reduce the risk of a future maintainer removing one of the two masking layers thinking it is redundant.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
warp/_src/types.py (1)

6608-6611: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the allocate() device-support docstring.

allocate_by_tiles() now supports CPU and CUDA and allocate() delegates to it, so this CUDA-only sentence is stale for the new CPU build path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/_src/types.py` around lines 6608 - 6611, Update the docstring for the
allocate() method to reflect its current device support. The statement "This
function is only supported for CUDA devices" is outdated since allocate() now
delegates to allocate_by_tiles() which supports both CPU and CUDA. Modify the
docstring to indicate that the function supports both CPU and CUDA devices,
removing or updating the CUDA-only restriction statement.
🧹 Nitpick comments (3)
warp/examples/fem/example_apic_fluid.py (3)

140-154: ⚡ Quick win

Document the new capturable flag in Google style.

The signature now exposes capturable, but the docstring only describes the equations. Add an Args: section so callers know when to use the graph-safe solve path.

As per coding guidelines, "Follow Google-style docstrings" and "Use double backticks for code elements and parameter cross-references."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/examples/fem/example_apic_fluid.py` around lines 140 - 154, The
docstring for the `solve_incompressibility` function is missing documentation
for the new `capturable` parameter. Add a Google-style `Args:` section to the
docstring that documents all parameters, with particular emphasis on explaining
the `capturable` flag and when callers should set it to enable the graph-safe
solve path. Use double backticks around code elements and parameter names to
follow the established documentation guidelines.

Source: Coding guidelines


259-263: ⚡ Quick win

Route the new warning through Warp’s warning helper.

This new warning currently goes through warnings.warn(), which writes to stderr instead of Warp’s configured warning route.

♻️ Proposed fix
         if use_cuda_graph and self._device.is_cuda and not self._use_cuda_graph:
-            warnings.warn(
+            from warp._src.utils import warn  # noqa: PLC0415
+
+            warn(
                 "CUDA graph capture for example_apic_fluid requires a CUDA memory pool and conditional graphs.",
                 stacklevel=2,
             )

As per coding guidelines, "warp/**/*.py: Use warp._src.utils.warn() instead of warnings.warn()—it routes warnings to stdout."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/examples/fem/example_apic_fluid.py` around lines 259 - 263, The
warnings.warn() call in the CUDA graph capture check is not following Warp's
coding guidelines. Replace the warnings.warn() call with warp._src.utils.warn(),
making sure to pass the warning message appropriately. Check the signature of
warp._src.utils.warn() to ensure all parameters are compatible, as it may not
accept the stacklevel parameter that warnings.warn() uses.

Source: Coding guidelines


200-221: ⚡ Quick win

Add class-level docs for the new constructor controls.

Example.__init__() now exposes graph and grid-capacity knobs, but Example has no class docstring documenting them. Put the Google-style Args: documentation on the class, not on __init__().

As per coding guidelines, "Document __init__ parameters in the class docstring, not the __init__ method."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/examples/fem/example_apic_fluid.py` around lines 200 - 221, Add a
class-level docstring to the Example class that documents all the constructor
parameters exposed by the __init__ method. The docstring should use Google-style
formatting with an Args: section that documents the parameters: quiet,
stage_path, voxel_size, opengl, use_cuda_graph, grid_capacity_ratio,
grid_leaf_capacity_ratio, grid_internal_capacity_ratio, max_active_voxels,
max_leaf_nodes, max_lower_nodes, and max_upper_nodes. Place this docstring at
the beginning of the Example class definition to follow the coding guideline of
documenting __init__ parameters in the class docstring rather than on the
__init__ method itself.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@warp/_src/fem/geometry/adaptive_nanogrid.py`:
- Around line 810-825: The kernel function _fill_environment_cell_level_flat
writes to cell_level array without validating that cell_index is not -1, which
could occur if the volume lookup fails. Add a defensive guard check before the
assignment to cell_level to verify that cell_index is not equal to -1 before
performing the write operation, matching the pattern used in the parallel
_fill_cell_env_from_ijk_points_masked kernel to prevent potential out-of-bounds
writes.

In `@warp/_src/types.py`:
- Line 6709: The _supported_allocation_types tuple has been expanded to include
uint32, int64, double, and Vec3d, but the docstring for the allocate_by_tiles()
method has not been updated to reflect these new supported types. Locate the
allocate_by_tiles() method docstring and update it to document all supported
allocation types including the newly added ones (uint32, int64, double, and
Vec3d) in addition to the previously documented types (int32, float, Vec3f,
Vec4f). Ensure any other related documentation around lines 6767-6770 is also
updated consistently to reflect the complete list of supported types.
- Around line 6922-6930: The assignment of `_rebuild_bg_value = bg_value` at the
point where RebuildInfo is created (and also at the second location noted in the
comment) directly retains a reference to the caller's mutable object. Instead of
keeping this reference, extract and store the C payload/representation of
bg_value at assignment time so that mutations to the caller's original object do
not affect future rebuilds. Then update the `_rebuild_tiles()` method to
reconstruct and pass the C value from the stored payload rather than
re-normalizing the potentially mutated Python bg_value object.
- Around line 7267-7272: The validation in the _volume_rebuild_status_array
function is too permissive. The is_array() check accepts non-regular array
wrappers like indexedarray and fabric annotations, but the function immediately
accesses status.ptr which requires a regular contiguous array. Add additional
validation after the is_array() check to ensure status is a regular contiguous
Warp array by checking that it is not an indexed array, fabric annotation, or
non-contiguous layout. Raise a RuntimeError with a clear message if these
conditions are not met, preventing AttributeError or invalid pointer issues
downstream.

In `@warp/examples/fem/example_apic_fluid.py`:
- Around line 548-554: The validation check for the three ratio parameters
(active_ratio, leaf_ratio, and internal_ratio) currently only checks if they are
positive (greater than 0.0), but should reject values below 1.0 to prevent
capacity allocations smaller than the initial grid topology counts. Update the
conditional check from ratio <= 0.0 to ratio < 1.0 and modify the error message
to indicate that the ratios must be at least 1.0 or greater.

In `@warp/native/volume_builder_cpu.cpp`:
- Around line 681-691: The issue is that when capacity_status indicates overflow
(not WP_VOLUME_REBUILD_SUCCESS), the code switches layout_capacities to
exact_capacities but calculates out_grid_size using std::max of
rebuildable_grid_size for both capacities, which can allocate a grid larger than
the requested capacity and violates the fixed-capacity contract. Either return
failure/null when capacity_status is not WP_VOLUME_REBUILD_SUCCESS, or ensure
that out_grid_size is calculated using only the rebuildable_grid_size that
corresponds to the layout_capacities actually being used (the conditional value
selected for layout_capacities), not the maximum of both.

In `@warp/native/volume_builder.cu`:
- Around line 292-321: The rebuild_finalize_counts() function removes invalid
keys by decrementing counts when the last element is REBUILD_INVALID_KEY, but
when all counts become zero (all data filtered out), the status is left as
success while the CPU path treats this as invalid input. Add a check after the
four invalid key removal statements (that decrement leaf_count, lower_count,
upper_count, and voxel_count) to detect when all counts are zero and call
rebuild_set_status to set an appropriate error status, ensuring the GPU and CPU
paths handle all-masked rebuilds consistently.
- Around line 956-958: Move capacity validation to occur before device memory
allocation in the allocate_rebuildable_grid_from_tiles and
allocate_rebuildable_grid_from_active_voxels functions rather than in the
rebuild functions. Add explicit validation that capacities.voxel_count is not
zero before allowing the active voxel grid path to reach the
rebuild_set_active_voxels kernel launch at line 957, since rebuild_num_blocks
with zero voxel_count results in zero blocks being launched. Return nullptr and
set the result size to zero with WP_VOLUME_REBUILD_INVALID_INPUT error code when
any required capacity value (including voxel_count for active voxel grids) is
invalid. Apply this same validation pattern to the other affected code paths
mentioned at lines 1000-1007, 1161-1167, and 1207-1213.

---

Outside diff comments:
In `@warp/_src/types.py`:
- Around line 6608-6611: Update the docstring for the allocate() method to
reflect its current device support. The statement "This function is only
supported for CUDA devices" is outdated since allocate() now delegates to
allocate_by_tiles() which supports both CPU and CUDA. Modify the docstring to
indicate that the function supports both CPU and CUDA devices, removing or
updating the CUDA-only restriction statement.

---

Nitpick comments:
In `@warp/examples/fem/example_apic_fluid.py`:
- Around line 140-154: The docstring for the `solve_incompressibility` function
is missing documentation for the new `capturable` parameter. Add a Google-style
`Args:` section to the docstring that documents all parameters, with particular
emphasis on explaining the `capturable` flag and when callers should set it to
enable the graph-safe solve path. Use double backticks around code elements and
parameter names to follow the established documentation guidelines.
- Around line 259-263: The warnings.warn() call in the CUDA graph capture check
is not following Warp's coding guidelines. Replace the warnings.warn() call with
warp._src.utils.warn(), making sure to pass the warning message appropriately.
Check the signature of warp._src.utils.warn() to ensure all parameters are
compatible, as it may not accept the stacklevel parameter that warnings.warn()
uses.
- Around line 200-221: Add a class-level docstring to the Example class that
documents all the constructor parameters exposed by the __init__ method. The
docstring should use Google-style formatting with an Args: section that
documents the parameters: quiet, stage_path, voxel_size, opengl, use_cuda_graph,
grid_capacity_ratio, grid_leaf_capacity_ratio, grid_internal_capacity_ratio,
max_active_voxels, max_leaf_nodes, max_lower_nodes, and max_upper_nodes. Place
this docstring at the beginning of the Example class definition to follow the
coding guideline of documenting __init__ parameters in the class docstring
rather than on the __init__ method itself.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 3e2eacbe-8cd7-460e-81f3-c4fc66fad1a5

📥 Commits

Reviewing files that changed from the base of the PR and between 48e2635 and 37f9ce4.

⛔ Files ignored due to path filters (15)
  • warp/native/nanovdb/GridHandle.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/HostBuffer.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/NodeManager.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/cuda/DeviceBuffer.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/cuda/DeviceResource.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/cuda/GridHandle.cuh is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/cuda/NodeManager.cuh is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/cuda/TempPool.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/tools/GridChecksum.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/tools/cuda/GridChecksum.cuh is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/tools/cuda/PointsToGrid.cuh is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/util/ForEach.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/util/Range.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/util/cuda/Timer.h is excluded by !warp/native/nanovdb/**
  • warp/native/nanovdb/util/cuda/Util.h is excluded by !warp/native/nanovdb/**
📒 Files selected for processing (24)
  • CHANGELOG.md
  • build_lib.py
  • docs/language_reference/builtins.rst
  • warp/__init__.pyi
  • warp/_src/builtins.py
  • warp/_src/context.py
  • warp/_src/fem/geometry/adaptive_nanogrid.py
  • warp/_src/fem/geometry/nanogrid.py
  • warp/_src/types.py
  • warp/examples/fem/example_apic_fluid.py
  • warp/examples/fem/example_apic_fluid_multi_env.py
  • warp/native/exports.h
  • warp/native/volume.cpp
  • warp/native/volume.cu
  • warp/native/volume.h
  • warp/native/volume_builder.cu
  • warp/native/volume_builder.h
  • warp/native/volume_builder_cpu.cpp
  • warp/native/volume_impl.h
  • warp/native/warp.h
  • warp/tests/fem/test_fem_examples.py
  • warp/tests/fem/test_fem_geometry.py
  • warp/tests/fem/test_fem_multi_env.py
  • warp/tests/geometry/test_volume_write.py

Comment on lines 810 to +825
@wp.kernel
def _fill_environment_cell_level(
def _fill_environment_cell_level_flat(
cell_grid: wp.uint64,
packed_offset: int,
point_mask: wp.array(dtype=wp.int32),
packed_ijk: wp.array(dtype=wp.vec3i),
packed_level: wp.array(dtype=wp.uint8),
cell_level: wp.array(dtype=wp.uint8),
):
cell = wp.tid()
packed_index = packed_offset + cell
ijk = packed_ijk[packed_index]
point = wp.tid()
if point_mask:
if point_mask[point] == 0:
return

ijk = packed_ijk[point]
cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2])
cell_level[cell_index] = packed_level[packed_index]
cell_level[cell_index] = packed_level[point]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing guard for invalid cell lookup.

The kernel writes to cell_level[cell_index] without checking if cell_index == -1. While this should not happen for valid unmasked points (since the volume is built from the same packed coordinates), the parallel _fill_cell_env_from_ijk_points_masked kernel in nanogrid.py defensively checks if cell_index != -1 before writing. Adding the same guard here would maintain consistency and prevent out-of-bounds writes if the invariant is ever violated.

🛡️ Proposed fix to add guard
 `@wp.kernel`
 def _fill_environment_cell_level_flat(
     cell_grid: wp.uint64,
     point_mask: wp.array(dtype=wp.int32),
     packed_ijk: wp.array(dtype=wp.vec3i),
     packed_level: wp.array(dtype=wp.uint8),
     cell_level: wp.array(dtype=wp.uint8),
 ):
     point = wp.tid()
     if point_mask:
         if point_mask[point] == 0:
             return

     ijk = packed_ijk[point]
     cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2])
-    cell_level[cell_index] = packed_level[point]
+    if cell_index != -1:
+        cell_level[cell_index] = packed_level[point]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@wp.kernel
def _fill_environment_cell_level(
def _fill_environment_cell_level_flat(
cell_grid: wp.uint64,
packed_offset: int,
point_mask: wp.array(dtype=wp.int32),
packed_ijk: wp.array(dtype=wp.vec3i),
packed_level: wp.array(dtype=wp.uint8),
cell_level: wp.array(dtype=wp.uint8),
):
cell = wp.tid()
packed_index = packed_offset + cell
ijk = packed_ijk[packed_index]
point = wp.tid()
if point_mask:
if point_mask[point] == 0:
return
ijk = packed_ijk[point]
cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2])
cell_level[cell_index] = packed_level[packed_index]
cell_level[cell_index] = packed_level[point]
`@wp.kernel`
def _fill_environment_cell_level_flat(
cell_grid: wp.uint64,
point_mask: wp.array(dtype=wp.int32),
packed_ijk: wp.array(dtype=wp.vec3i),
packed_level: wp.array(dtype=wp.uint8),
cell_level: wp.array(dtype=wp.uint8),
):
point = wp.tid()
if point_mask:
if point_mask[point] == 0:
return
ijk = packed_ijk[point]
cell_index = wp.volume_lookup_index(cell_grid, ijk[0], ijk[1], ijk[2])
if cell_index != -1:
cell_level[cell_index] = packed_level[point]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/_src/fem/geometry/adaptive_nanogrid.py` around lines 810 - 825, The
kernel function _fill_environment_cell_level_flat writes to cell_level array
without validating that cell_index is not -1, which could occur if the volume
lookup fails. Add a defensive guard check before the assignment to cell_level to
verify that cell_index is not equal to -1 before performing the write operation,
matching the pattern used in the parallel _fill_cell_env_from_ijk_points_masked
kernel to prevent potential out-of-bounds writes.

Comment thread warp/_src/types.py
# nanovdb types for which we instantiate the grid builder
# Should be in sync with WP_VOLUME_BUILDER_INSTANTIATE_TYPES in volume_builder.h
_supported_allocation_types = ("int32", "float", "Vec3f", "Vec4f")
_supported_allocation_types = ("int32", "uint32", "int64", "float", "double", "Vec3f", "Vec3d", "Vec4f")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the supported grid type docs in sync with the expanded allocation list.

_supported_allocation_types now includes uint32, int64, double, and Vec3d, but the public allocate_by_tiles() docstring still advertises only int, float, vec3f, and vec4f.

Also applies to: 6767-6770

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/_src/types.py` at line 6709, The _supported_allocation_types tuple has
been expanded to include uint32, int64, double, and Vec3d, but the docstring for
the allocate_by_tiles() method has not been updated to reflect these new
supported types. Locate the allocate_by_tiles() method docstring and update it
to document all supported allocation types including the newly added ones
(uint32, int64, double, and Vec3d) in addition to the previously documented
types (int32, float, Vec3f, Vec4f). Ensure any other related documentation
around lines 6767-6770 is also updated consistently to reflect the complete list
of supported types.

Comment thread warp/_src/types.py
Comment on lines +6922 to +6930
if rebuildable:
volume._rebuild_info = Volume.RebuildInfo(
"tiles",
max_tiles_c * Volume._NANOVDB_LEAF_TABLE_COUNT,
max_tiles_c,
max_lower_nodes_c,
max_upper_nodes_c,
)
volume._rebuild_bg_value = bg_value

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Copy the rebuild background payload instead of retaining the caller object.

For vector/list background values, _rebuild_bg_value = bg_value keeps a mutable caller-owned object. If that object is later mutated, _rebuild_tiles() recomputes the C payload from the changed value, so rebuilds no longer preserve the original background value/type promised by the API.

Suggested direction
-            volume._rebuild_bg_value = bg_value
+            # Store an immutable/owned normalized payload for future rebuilds.
+            volume._rebuild_bg_value = _volume_rebuild_bg_value_copy(bg_value)

Then have _rebuild_tiles() reconstruct/pass the C value from that owned payload rather than re-normalizing the original Python object.

Also applies to: 7162-7164

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/_src/types.py` around lines 6922 - 6930, The assignment of
`_rebuild_bg_value = bg_value` at the point where RebuildInfo is created (and
also at the second location noted in the comment) directly retains a reference
to the caller's mutable object. Instead of keeping this reference, extract and
store the C payload/representation of bg_value at assignment time so that
mutations to the caller's original object do not affect future rebuilds. Then
update the `_rebuild_tiles()` method to reconstruct and pass the C value from
the stored payload rather than re-normalizing the potentially mutated Python
bg_value object.

Comment thread warp/_src/types.py
Comment on lines +7267 to +7272
def _volume_rebuild_status_array(status: array, device) -> array:
if not is_array(status) or status.dtype != uint32 or status.size < 1:
raise RuntimeError("status must be a Warp array with dtype uint32 and at least one element")
if status.device != device:
raise RuntimeError(f"status must be on device {device}")
return status

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Require status to be a regular contiguous wp.array.

is_array() accepts non-regular array wrappers, but callers immediately use status.ptr. Tighten this helper to reject indexedarray/fabric annotations/non-contiguous inputs with the documented RuntimeError instead of leaking an AttributeError or passing an invalid native pointer.

Proposed validation tightening
 def _volume_rebuild_status_array(status: array, device) -> array:
-    if not is_array(status) or status.dtype != uint32 or status.size < 1:
-        raise RuntimeError("status must be a Warp array with dtype uint32 and at least one element")
+    if (
+        type(status) is not array
+        or status.dtype != uint32
+        or status.ndim != 1
+        or status.shape[0] < 1
+        or not status.is_contiguous
+    ):
+        raise RuntimeError("status must be a contiguous 1D Warp array with dtype uint32 and at least one element")
     if status.device != device:
         raise RuntimeError(f"status must be on device {device}")
     return status
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/_src/types.py` around lines 7267 - 7272, The validation in the
_volume_rebuild_status_array function is too permissive. The is_array() check
accepts non-regular array wrappers like indexedarray and fabric annotations, but
the function immediately accesses status.ptr which requires a regular contiguous
array. Add additional validation after the is_array() check to ensure status is
a regular contiguous Warp array by checking that it is not an indexed array,
fabric annotation, or non-contiguous layout. Raise a RuntimeError with a clear
message if these conditions are not met, preventing AttributeError or invalid
pointer issues downstream.

Comment on lines +548 to +554
for name, ratio in (
("grid_capacity_ratio", active_ratio),
("grid_leaf_capacity_ratio", leaf_ratio),
("grid_internal_capacity_ratio", internal_ratio),
):
if ratio <= 0.0:
raise ValueError(f"{name} must be positive")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject capacity ratios below the initial grid size.

These ratios are used to size the rebuildable volume allocated from the initial particle grid. Values between 0.0 and 1.0 pass validation but can produce capacities smaller than the exact initial topology counts, causing immediate allocation/rebuild overflow.

🐛 Proposed fix
         for name, ratio in (
             ("grid_capacity_ratio", active_ratio),
             ("grid_leaf_capacity_ratio", leaf_ratio),
             ("grid_internal_capacity_ratio", internal_ratio),
         ):
-            if ratio <= 0.0:
-                raise ValueError(f"{name} must be positive")
+            if ratio < 1.0:
+                raise ValueError(f"{name} must be at least 1.0")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/examples/fem/example_apic_fluid.py` around lines 548 - 554, The
validation check for the three ratio parameters (active_ratio, leaf_ratio, and
internal_ratio) currently only checks if they are positive (greater than 0.0),
but should reject values below 1.0 to prevent capacity allocations smaller than
the initial grid topology counts. Update the conditional check from ratio <= 0.0
to ratio < 1.0 and modify the error message to indicate that the ratios must be
at least 1.0 or greater.

Comment on lines +681 to +691
const uint32_t capacity_status = rebuild_capacity_status(scratch, capacities);
const VolumeRebuildCapacities exact_capacities = rebuild_exact_capacities(scratch);
const VolumeRebuildCapacities& layout_capacities
= capacity_status == WP_VOLUME_REBUILD_SUCCESS ? capacities : exact_capacities;

out_grid_size
= std::max(rebuildable_grid_size<BuildT>(capacities), rebuildable_grid_size<BuildT>(exact_capacities));
out_grid = static_cast<nanovdb::Grid<nanovdb::NanoTree<BuildT>>*>(
wp_alloc_host(out_grid_size, "(native:volume_builder_cpu)")
);
rebuild_populate_grid(out_grid, out_grid_size, scratch, layout_capacities, params);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't grow rebuildable CPU grids past requested capacities.

On capacity overflow, this switches to exact_capacities and allocates max(requested, exact), so CPU allocation can return a grid layout larger than the caller’s declared rebuild capacity while reporting capacity exceeded. That breaks the fixed-capacity contract and can diverge from CUDA/future in-place rebuild metadata. Return failure/null or populate only within the requested layout instead of silently changing the layout.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/native/volume_builder_cpu.cpp` around lines 681 - 691, The issue is that
when capacity_status indicates overflow (not WP_VOLUME_REBUILD_SUCCESS), the
code switches layout_capacities to exact_capacities but calculates out_grid_size
using std::max of rebuildable_grid_size for both capacities, which can allocate
a grid larger than the requested capacity and violates the fixed-capacity
contract. Either return failure/null when capacity_status is not
WP_VOLUME_REBUILD_SUCCESS, or ensure that out_grid_size is calculated using only
the rebuildable_grid_size that corresponds to the layout_capacities actually
being used (the conditional value selected for layout_capacities), not the
maximum of both.

Comment on lines +292 to +321
if (leaf_count > 0 && leaf_keys && leaf_keys[leaf_count - 1] == REBUILD_INVALID_KEY)
--leaf_count;
if (lower_count > 0 && lower_keys && lower_keys[lower_count - 1] == REBUILD_INVALID_KEY)
--lower_count;
if (upper_count > 0 && upper_keys && upper_keys[upper_count - 1] == REBUILD_INVALID_KEY)
--upper_count;
if (voxel_count > 0 && voxel_keys && voxel_keys[voxel_count - 1] == REBUILD_INVALID_KEY)
--voxel_count;

if (enforce_capacities && leaf_count > capacities.leaf_count) {
rebuild_set_status(status, WP_VOLUME_REBUILD_LEAF_CAPACITY_EXCEEDED);
leaf_count = capacities.leaf_count;
}
if (enforce_capacities && lower_count > capacities.lower_count) {
rebuild_set_status(status, WP_VOLUME_REBUILD_LOWER_CAPACITY_EXCEEDED);
lower_count = capacities.lower_count;
}
if (enforce_capacities && upper_count > capacities.upper_count) {
rebuild_set_status(status, WP_VOLUME_REBUILD_UPPER_CAPACITY_EXCEEDED);
upper_count = capacities.upper_count;
}
if (enforce_capacities && uint64_t(voxel_count) > capacities.voxel_count) {
rebuild_set_status(status, WP_VOLUME_REBUILD_VOXEL_CAPACITY_EXCEEDED);
voxel_count = uint32_t(capacities.voxel_count);
}

__shared__ BBoxProxy bbox_mem;
counts[REBUILD_COUNT_LEAF] = leaf_count;
counts[REBUILD_COUNT_LOWER] = lower_count;
counts[REBUILD_COUNT_UPPER] = upper_count;
counts[REBUILD_COUNT_VOXEL] = active_voxel_grid ? voxel_count : leaf_count * PNANOVDB_LEAF_TABLE_COUNT;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Report all-masked rebuilds consistently.

When point_mask filters out every point, the sorted runs contain only REBUILD_INVALID_KEY; rebuild_finalize_counts() decrements all counts to zero but leaves status as success, and the rebuild then populates an empty/invalid-bbox grid. The CPU path treats the same no-valid-point case as invalid input, so masked rebuilds diverge by device. Set invalid status here, or explicitly define empty masked rebuilds as success on both paths.

Also applies to: 1010-1017

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/native/volume_builder.cu` around lines 292 - 321, The
rebuild_finalize_counts() function removes invalid keys by decrementing counts
when the last element is REBUILD_INVALID_KEY, but when all counts become zero
(all data filtered out), the status is left as success while the CPU path treats
this as invalid input. Add a check after the four invalid key removal statements
(that decrement leaf_count, lower_count, upper_count, and voxel_count) to detect
when all counts are zero and call rebuild_set_status to set an appropriate error
status, ensuring the GPU and CPU paths handle all-masked rebuilds consistently.

Comment on lines +956 to +958
if (scratch.active_voxel_grid && data.grid_type == PNANOVDB_GRID_TYPE_ONINDEX) {
rebuild_set_active_voxels<<<rebuild_num_blocks(capacities.voxel_count), REBUILD_NUM_THREADS, 0, stream>>>(data);
rebuild_exclusive_sum_u32_to_u64(leaf_active_counts, leaf_active_prefix, int(capacities.leaf_count), stream);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the file structure and locate the key functions
wc -l warp/native/volume_builder.cu

Repository: NVIDIA/warp

Length of output: 90


🏁 Script executed:

# Examine the lines mentioned in the review comment
sed -n '950,970p' warp/native/volume_builder.cu

Repository: NVIDIA/warp

Length of output: 1354


🏁 Script executed:

# Check the allocation wrappers mentioned
grep -n "allocate_rebuildable_grid_from_active_voxels\|allocate_rebuildable_grid_from_tiles" warp/native/volume_builder.cu | head -20

Repository: NVIDIA/warp

Length of output: 342


🏁 Script executed:

# Find rebuild_grid_from_points_impl function
grep -n "rebuild_grid_from_points_impl" warp/native/volume_builder.cu

Repository: NVIDIA/warp

Length of output: 271


🏁 Script executed:

# Check rebuild_num_blocks implementation
grep -n "rebuild_num_blocks" warp/native/volume_builder.cu | head -10

Repository: NVIDIA/warp

Length of output: 1169


🏁 Script executed:

# Check rebuild_grid_from_points_impl implementation for capacity validation
sed -n '987,1050p' warp/native/volume_builder.cu

Repository: NVIDIA/warp

Length of output: 2256


🏁 Script executed:

# Check allocate_rebuildable_grid_from_active_voxels
sed -n '1195,1215p' warp/native/volume_builder.cu

Repository: NVIDIA/warp

Length of output: 864


🏁 Script executed:

# Check allocate_rebuildable_grid_from_tiles
sed -n '1149,1192p' warp/native/volume_builder.cu

Repository: NVIDIA/warp

Length of output: 1591


🏁 Script executed:

# Check the mentioned line ranges: 1000-1007, 1161-1167, 1207-1213
echo "=== Lines 1000-1007 ==="
sed -n '1000,1007p' warp/native/volume_builder.cu
echo ""
echo "=== Lines 1161-1167 ==="
sed -n '1161,1167p' warp/native/volume_builder.cu
echo ""
echo "=== Lines 1207-1213 ==="
sed -n '1207,1213p' warp/native/volume_builder.cu

Repository: NVIDIA/warp

Length of output: 1295


Validate capacities before allocating or launching CUDA kernels.

Both allocation wrappers (allocate_rebuildable_grid_from_tiles and allocate_rebuildable_grid_from_active_voxels) allocate device memory before calling rebuild functions that validate capacities. For active voxel grids, rebuild_grid_from_points_impl() validates leaf_count, lower_count, and upper_count but does not check capacities.voxel_count. This allows zero voxel capacity to reach line 957, where rebuild_set_active_voxels<<<rebuild_num_blocks(capacities.voxel_count)... launches with zero blocks. Move validation to occur before memory allocation and prevent voxel_count == 0 from reaching kernel launches on the active voxel path. Return nullptr/zero size with WP_VOLUME_REBUILD_INVALID_INPUT for invalid capacities.

Also applies to: 1000-1007, 1161-1167, 1207-1213

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@warp/native/volume_builder.cu` around lines 956 - 958, Move capacity
validation to occur before device memory allocation in the
allocate_rebuildable_grid_from_tiles and
allocate_rebuildable_grid_from_active_voxels functions rather than in the
rebuild functions. Add explicit validation that capacities.voxel_count is not
zero before allowing the active voxel grid path to reach the
rebuild_set_active_voxels kernel launch at line 957, since rebuild_num_blocks
with zero voxel_count results in zero blocks being launched. Return nullptr and
set the result size to zero with WP_VOLUME_REBUILD_INVALID_INPUT error code when
any required capacity value (including voxel_count for active voxel grids) is
invalid. Apply this same validation pattern to the other affected code paths
mentioned at lines 1000-1007, 1161-1167, and 1207-1213.

@shi-eric

Copy link
Copy Markdown
Contributor Author

/ok to test 37f9ce4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants