fix(data): preserve other batch edge_index in Batch.append() by Ryan-Reese · Pull Request #46 · NVIDIA/nvalchemi-toolkit

Ryan-Reese · 2026-04-02T17:23:52Z

ALCHEMI Toolkit Pull Request

Description

Batch.append() temporarily offsets the other batch's edge_index by the receiver's atom count for correct concatenation, but never restores the original values. This silently corrupts the input batch — any subsequent use of the appended batch produces wrong edge indices.

Type of Change

Bug fix (non-breaking change that fixes an issue)

Related Issues

Relates to #21 (Data layer gaps and limitations)

Changes Made

Save other's original edge_index reference before applying the node offset
Wrap concatenation in try/finally to restore other's edge_index even if concatenation raises
Reject shared-storage aliasing (batch.append(batch) or shared MultiLevelStorage) which cannot work correctly with in-place concatenation

Reproduction

from nvalchemi.data import AtomicData, Batch
import torch

d1 = AtomicData(
    atomic_numbers=torch.ones(3, dtype=torch.long),
    positions=torch.randn(3, 3),
    edge_index=torch.tensor([[0, 1], [1, 2]]),
)
d2 = AtomicData(
    atomic_numbers=torch.ones(4, dtype=torch.long),
    positions=torch.randn(4, 3),
    edge_index=torch.tensor([[0, 1], [1, 2], [2, 3]]),
)

b1 = Batch.from_data_list([d1])
b2 = Batch.from_data_list([d2])

b2_ei_before = b2.edge_index.clone()
b1.append(b2)

# BEFORE fix: b2.edge_index is [[3,4],[4,5],[5,6]] — corrupted
# AFTER fix:  b2.edge_index is [[0,1],[1,2],[2,3]] — preserved
assert torch.equal(b2.edge_index, b2_ei_before)

Testing

Unit tests pass locally (make pytest)
Linting passes (make lint)
Verified bug reproduction before/after fix on GPU node
All 68 existing test/data/test_batch.py tests pass with no regressions

Checklist

I have performed a self-review of my code
I have updated the documentation (if applicable)

Additional Notes

Known limitation (pre-existing): append() is not transactional with respect to self — if group.concatenate() raises mid-loop, self may be left partially updated. This is a pre-existing architectural issue unrelated to this fix and would require cloning all storage groups before mutation. The try/finally added here only guarantees other is restored.

Batch.append() temporarily offsets the other batch's edge_index by the receiver's atom count for correct concatenation, but never restores the original values. This silently corrupts the input batch — any subsequent use of the appended batch (e.g. in inflight batching or multi-stage pipelines) produces wrong edge indices. Save and restore the original edge_index reference around the concatenation in a try/finally block to keep the input batch intact even if concatenation raises. Also reject shared-storage aliasing (batch.append(batch) or shared MultiLevelStorage) which cannot work correctly with in-place concatenation.

copy-pr-bot · 2026-04-02T17:23:57Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

greptile-apps · 2026-04-02T17:35:29Z

Greptile Summary

This PR fixes a silent mutation bug in Batch.append() where other's edge_index was permanently offset by the receiver's node count after concatenation. The fix saves the original tensor reference before the offset, wraps the concatenation in try/finally to guarantee restoration, and adds an early guard to reject self-aliasing. The three new tests cover all the new code paths and the previous reviewer comments have been fully addressed.

Important Files Changed

Filename	Overview
nvalchemi/data/batch.py	Adds self-aliasing guard and try/finally to restore other's edge_index after append; logic is correct and the finally block safely uses the cached local `other_edges`.
test/data/test_batch.py	Adds three new tests covering edge_index preservation, self-append guard, and shared-storage guard; all tests correctly exercise the new behaviour.

_{Reviews (4): Last reviewed commit: "ci: retrigger checks" | Re-trigger Greptile}

greptile-apps · 2026-04-02T17:35:32Z

nvalchemi/data/batch.py

+        finally:
+            # Restore other's edge_index to avoid mutating the input batch.
+            if saved_ei is not None:
+                other._edges_group._data["edge_index"] = saved_ei


other._edges_group re-evaluated in finally instead of using the cached local

The finally block calls other._edges_group (a property that does self._storage.groups.get("edges")) instead of reusing the already-resolved other_edges local variable. Under normal operation both refer to the same object, but using the local is slightly cheaper and makes the symmetry with the setup code explicit:

Suggested change

finally:

# Restore other's edge_index to avoid mutating the input batch.

if saved_ei is not None:

other._edges_group._data["edge_index"] = saved_ei

finally:

# Restore other's edge_index to avoid mutating the input batch.

if saved_ei is not None:

other_edges._data["edge_index"] = saved_ei

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-04-02T17:35:33Z

nvalchemi/data/batch.py

+        if other is self or other._storage is self._storage:
+            raise ValueError(
+                "Cannot append a Batch that shares storage with the "
+                "receiver (would corrupt both).  Use "
+                "batch.append(batch.clone()) instead."
+            )


Missing test coverage for the new guard and the core fix

Neither the ValueError for self-aliasing nor the edge_index preservation behavior described in the PR have a corresponding test. The existing test_append only checks graph/node counts on b1 and would not catch a regression. Consider adding two assertions:

# test self-aliasing guard with pytest.raises(ValueError, match="shares storage"): b1.append(b1) # test other's edge_index is not mutated b2_ei_before = b2.edge_index.clone() b1.append(b2) assert torch.equal(b2.edge_index, b2_ei_before)

@Ryan-Reese please consider this suggestion.

Add three tests to TestBatchMutation covering the new append() safety behaviour: - test_append_preserves_other_edge_index: verifies input batch's edge_index is not mutated after append - test_append_self_raises: verifies self-append raises ValueError - test_append_shared_storage_raises: verifies shared-storage append raises ValueError Use cached other_edges variable in finally block per review feedback.

Ryan-Reese · 2026-04-02T23:05:08Z

/ok to test efa2639

laserkelvin · 2026-04-02T23:25:53Z

Coverage is failing - I'll check out the branch locally and confirm it's okay

Ryan-Reese · 2026-04-02T23:29:19Z

@laserkelvin thanks :)

laserkelvin

I've confirmed locally that coverage is fine and tests pass

laserkelvin · 2026-04-03T00:39:33Z

nvalchemi/data/batch.py

-                group.concatenate(other_group)
-            else:
-                group.extend_for_appended_graphs(n_other)
+        try:


My concerns with this bit are twofold:

There's no exception being handled explicitly

There's overhead associated with try/except/finally; it's not huge but it can be non-negligible

If possible, I would rewrite it without the try block, unless there is something you're guarding against

Ryan-Reese marked this pull request as ready for review April 2, 2026 17:32

greptile-apps bot reviewed Apr 2, 2026

View reviewed changes

Ryan-Reese added 2 commits April 3, 2026 00:06

Merge branch 'main' into fix/batch-append-mutates-input

2be9476

ci: retrigger checks

51acc4c

laserkelvin requested changes Apr 3, 2026

View reviewed changes

laserkelvin mentioned this pull request Apr 3, 2026

Prevent CI coverage from failing prematurely #50

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(data): preserve other batch edge_index in Batch.append()#46

fix(data): preserve other batch edge_index in Batch.append()#46
Ryan-Reese wants to merge 4 commits intoNVIDIA:mainfrom
Ryan-Reese:fix/batch-append-mutates-input

Ryan-Reese commented Apr 2, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 2, 2026

Uh oh!

greptile-apps bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

dallasfoster Apr 2, 2026

Uh oh!

Ryan-Reese Apr 2, 2026

Uh oh!

Ryan-Reese commented Apr 2, 2026

Uh oh!

laserkelvin commented Apr 2, 2026

Uh oh!

Ryan-Reese commented Apr 2, 2026

Uh oh!

laserkelvin left a comment

Uh oh!

laserkelvin Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ryan-Reese commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ALCHEMI Toolkit Pull Request

Description

Type of Change

Related Issues

Changes Made

Reproduction

Testing

Checklist

Additional Notes

Uh oh!

copy-pr-bot bot commented Apr 2, 2026

Uh oh!

greptile-apps bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

dallasfoster Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Ryan-Reese Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Ryan-Reese commented Apr 2, 2026

Uh oh!

laserkelvin commented Apr 2, 2026

Uh oh!

Ryan-Reese commented Apr 2, 2026

Uh oh!

laserkelvin left a comment

Choose a reason for hiding this comment

Uh oh!

laserkelvin Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ryan-Reese commented Apr 2, 2026 •

edited

Loading

greptile-apps bot commented Apr 2, 2026 •

edited

Loading