Skip to content

fix(data): add bounds check for out-of-range indices in Batch.get_data()#49

Open
Ryan-Reese wants to merge 1 commit intoNVIDIA:mainfrom
Ryan-Reese:fix/get-data-bounds-check
Open

fix(data): add bounds check for out-of-range indices in Batch.get_data()#49
Ryan-Reese wants to merge 1 commit intoNVIDIA:mainfrom
Ryan-Reese:fix/get-data-bounds-check

Conversation

@Ryan-Reese
Copy link
Copy Markdown
Contributor

@Ryan-Reese Ryan-Reese commented Apr 3, 2026

Description

Batch.get_data() adjusts negative indices via idx = num_graphs + idx but does not validate the result. When the adjusted index is still negative (e.g. get_data(-5) on a 3-graph batch), Python's negative indexing silently wraps around on the internal storage tensors. Node-level data (positions, forces) may come from one graph while system-level data (energies, cell) comes from a different graph, returning inconsistent data.

Reproduction:

batch = Batch.from_data_list([argon_data, hydrogen_data, oxygen_data])
d = batch.get_data(-5)  # adjusted idx = -2
# d.atomic_numbers → oxygen (graph 2 atoms)
# d.energies → hydrogen value (graph 1 system data)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Changes Made

  • Added bounds check after negative index adjustment in Batch.get_data() to raise IndexError for any index outside [0, num_graphs)
  • Added parametrized regression test test_get_data_out_of_range_raises

Testing

  • Unit tests pass locally (make pytest)
  • Linting passes (make lint)
  • New tests added for new functionality meets coverage expectations?

Checklist

  • I have read and understand the Contributing Guidelines
  • I have updated the CHANGELOG.md
  • I have performed a self-review of my code
  • I have added docstrings to new functions/classes
  • I have updated the documentation (if applicable)

Batch.get_data() adjusts negative indices via `idx = num_graphs + idx`
but does not validate the result. When the adjusted index is still
negative (e.g. get_data(-5) on a 3-graph batch), Python's negative
indexing silently wraps around, returning node-level data from one
graph mixed with system-level data from another — silent data corruption.

Add a bounds check after the adjustment to raise IndexError for any
index outside [0, num_graphs).
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Ryan-Reese Ryan-Reese marked this pull request as ready for review April 3, 2026 00:23
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR correctly fixes a silent data-corruption bug in Batch.get_data() where an out-of-range negative index, after adjustment, would wrap around Python's internal storage tensors and return node-level data from one graph mixed with system-level data from another. The bounds check added at lines 586–590 is logically sound and the parametrized regression test (test_get_data_out_of_range_raises) provides good coverage across both negative and positive out-of-range cases.

One minor usability concern was identified:

  • Misleading error message for negative indices — the IndexError message reports the post-adjustment idx (e.g. -3) rather than the original value the caller passed (e.g. -5). This can confuse users debugging the error. Preserving the original index before the adjustment and including it in the message is a straightforward improvement (see inline comment).

Additionally, the PR's commit is missing the required Signed-off-by trailer. Per CONTRIBUTING.md, "any contribution which contains commits that are not Signed-Off will not be accepted." The commit should be amended with git commit --amend -s to add the Signed-off-by line before this can be merged.

Important Files Changed

Filename Overview
nvalchemi/data/batch.py Adds a post-adjustment bounds check in get_data() to raise IndexError for out-of-range indices; fix is correct, but the error message exposes the adjusted index rather than the original caller-provided value.
test/data/test_batch.py Adds a parametrized regression test covering both negative and positive out-of-range indices on a 2-graph batch; coverage and parametrization look correct.

Reviews (1): Last reviewed commit: "fix(data): add bounds check for out-of-r..." | Re-trigger Greptile

Comment on lines 584 to +590
if idx < 0:
idx = self.num_graphs + idx
if not (0 <= idx < self.num_graphs):
raise IndexError(
f"graph index {idx} is out of range for batch with "
f"{self.num_graphs} graph(s)"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Error message reports adjusted index, not the original

When a negative index is adjusted before the bounds check, the error message reports the adjusted (intermediate) index rather than the value the caller actually passed. For example, get_data(-5) on a 2-graph batch adjusts to idx = -3 and then raises:

IndexError: graph index -3 is out of range for batch with 2 graph(s)

The caller passed -5, so seeing -3 in the message is surprising and hard to debug. Preserving the original value before adjustment and referencing it in the message would make the error much clearer:

Suggested change
if idx < 0:
idx = self.num_graphs + idx
if not (0 <= idx < self.num_graphs):
raise IndexError(
f"graph index {idx} is out of range for batch with "
f"{self.num_graphs} graph(s)"
)
original_idx = idx
if idx < 0:
idx = self.num_graphs + idx
if not (0 <= idx < self.num_graphs):
raise IndexError(
f"graph index {original_idx} is out of range for batch with "
f"{self.num_graphs} graph(s)"
)

@laserkelvin
Copy link
Copy Markdown
Collaborator

I posted a fix in #50: if that gets merged in before this PR then we'll see if the workflow runs as intended. Otherwise the tests pass like I said, so I would be fine with merging after you address my comment

@laserkelvin laserkelvin added needs-ci Needs to have CI run bug Something isn't working labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-ci Needs to have CI run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants