fix(data): add bounds check for out-of-range indices in Batch.get_data() by Ryan-Reese · Pull Request #49 · NVIDIA/nvalchemi-toolkit

Ryan-Reese · 2026-04-03T00:18:29Z

Description

Batch.get_data() adjusts negative indices via idx = num_graphs + idx but does not validate the result. When the adjusted index is still negative (e.g. get_data(-5) on a 3-graph batch), Python's negative indexing silently wraps around on the internal storage tensors. Node-level data (positions, forces) may come from one graph while system-level data (energies, cell) comes from a different graph, returning inconsistent data.

Reproduction:

batch = Batch.from_data_list([argon_data, hydrogen_data, oxygen_data])
d = batch.get_data(-5)  # adjusted idx = -2
# d.atomic_numbers → oxygen (graph 2 atoms)
# d.energies → hydrogen value (graph 1 system data)

Type of Change

Bug fix (non-breaking change that fixes an issue)

Changes Made

Added bounds check after negative index adjustment in Batch.get_data() to raise IndexError for any index outside [0, num_graphs)
Added parametrized regression test test_get_data_out_of_range_raises

Testing

Unit tests pass locally (make pytest)
Linting passes (make lint)
New tests added for new functionality meets coverage expectations?

Checklist

I have read and understand the Contributing Guidelines
I have updated the CHANGELOG.md
I have performed a self-review of my code
I have added docstrings to new functions/classes
I have updated the documentation (if applicable)

Batch.get_data() adjusts negative indices via `idx = num_graphs + idx` but does not validate the result. When the adjusted index is still negative (e.g. get_data(-5) on a 3-graph batch), Python's negative indexing silently wraps around, returning node-level data from one graph mixed with system-level data from another — silent data corruption. Add a bounds check after the adjustment to raise IndexError for any index outside [0, num_graphs).

copy-pr-bot · 2026-04-03T00:18:32Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

greptile-apps · 2026-04-03T00:25:20Z

Greptile Summary

This PR correctly fixes a silent data-corruption bug in Batch.get_data() where an out-of-range negative index, after adjustment, would wrap around Python's internal storage tensors and return node-level data from one graph mixed with system-level data from another. The bounds check added at lines 586–590 is logically sound and the parametrized regression test (test_get_data_out_of_range_raises) provides good coverage across both negative and positive out-of-range cases.

One minor usability concern was identified:

Misleading error message for negative indices — the IndexError message reports the post-adjustment idx (e.g. -3) rather than the original value the caller passed (e.g. -5). This can confuse users debugging the error. Preserving the original index before the adjustment and including it in the message is a straightforward improvement (see inline comment).

Additionally, the PR's commit is missing the required Signed-off-by trailer. Per CONTRIBUTING.md, "any contribution which contains commits that are not Signed-Off will not be accepted." The commit should be amended with git commit --amend -s to add the Signed-off-by line before this can be merged.

Important Files Changed

Filename	Overview
nvalchemi/data/batch.py	Adds a post-adjustment bounds check in `get_data()` to raise `IndexError` for out-of-range indices; fix is correct, but the error message exposes the adjusted index rather than the original caller-provided value.
test/data/test_batch.py	Adds a parametrized regression test covering both negative and positive out-of-range indices on a 2-graph batch; coverage and parametrization look correct.

_{Reviews (1): Last reviewed commit: "fix(data): add bounds check for out-of-r..." | Re-trigger Greptile}

greptile-apps · 2026-04-03T00:25:23Z

nvalchemi/data/batch.py

        if idx < 0:
            idx = self.num_graphs + idx
+        if not (0 <= idx < self.num_graphs):
+            raise IndexError(
+                f"graph index {idx} is out of range for batch with "
+                f"{self.num_graphs} graph(s)"
+            )


Error message reports adjusted index, not the original

When a negative index is adjusted before the bounds check, the error message reports the adjusted (intermediate) index rather than the value the caller actually passed. For example, get_data(-5) on a 2-graph batch adjusts to idx = -3 and then raises:

IndexError: graph index -3 is out of range for batch with 2 graph(s)

The caller passed -5, so seeing -3 in the message is surprising and hard to debug. Preserving the original value before adjustment and referencing it in the message would make the error much clearer:

Suggested change

if idx < 0:

idx = self.num_graphs + idx

if not (0 <= idx < self.num_graphs):

raise IndexError(

f"graph index {idx} is out of range for batch with "

f"{self.num_graphs} graph(s)"

)

original_idx = idx

if idx < 0:

idx = self.num_graphs + idx

if not (0 <= idx < self.num_graphs):

raise IndexError(

f"graph index {original_idx} is out of range for batch with "

f"{self.num_graphs} graph(s)"

)

laserkelvin · 2026-04-03T01:12:56Z

I posted a fix in #50: if that gets merged in before this PR then we'll see if the workflow runs as intended. Otherwise the tests pass like I said, so I would be fine with merging after you address my comment

Ryan-Reese marked this pull request as ready for review April 3, 2026 00:23

greptile-apps bot reviewed Apr 3, 2026

View reviewed changes

laserkelvin added needs-ci Needs to have CI run bug Something isn't working labels Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(data): add bounds check for out-of-range indices in Batch.get_data()#49

fix(data): add bounds check for out-of-range indices in Batch.get_data()#49
Ryan-Reese wants to merge 1 commit intoNVIDIA:mainfrom
Ryan-Reese:fix/get-data-bounds-check

Ryan-Reese commented Apr 3, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

greptile-apps bot commented Apr 3, 2026

Uh oh!

greptile-apps bot Apr 3, 2026

Uh oh!

laserkelvin commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ryan-Reese commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Checklist

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

greptile-apps bot commented Apr 3, 2026

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

laserkelvin commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ryan-Reese commented Apr 3, 2026 •

edited

Loading