inplace update done via aliased outputs should have more strict checks #4036

jjsjann123 · 2025-03-07T01:15:15Z

# torch version: 2.7.0a0+git1335882
# cuda version: 12.8
# nvfuser version: 0.2.25+git6b0b17e
import torch
from nvfuser import FusionDefinition, DataType

def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(shape=[4, 2], contiguity=[False, True], dtype=DataType.Float, is_cpu=False, stride_order=[0, 1])
    S1 = fd.define_scalar(0.00000, dtype=DataType.Double)
    T2 = fd.ops.gt(T0, S1)
    S3 = fd.define_scalar(0.00000, dtype=DataType.Double)
    T4 = fd.ops.where(T2, T0, S3)
    T5 = fd.ops.cast(T4, dtype=DataType.Float)
    T6 = fd.ops.set(T5)
    T7 = fd.ops.permute(T6, [1, 0]) # this alias shouldn't be allowed
    fd.add_output(T7, T0)
    fd.add_output(T7)

with FusionDefinition() as fd:
    nvfuser_fusion_id0(fd)

inputs = [
    torch.randn(10, dtype=torch.float32, device='cuda:0').as_strided((4, 2), (1, 5)),
]
o = fd.execute(inputs)

Thanks to @csarofeen for the chat.
Our code base handles aliases in a restricted manner. We assume that the alias source and target should have identical logical domain (shapes). The above code example should error out, since the output size doesn't match the input sizes. (2, 4) vs (4, 2).

The text was updated successfully, but these errors were encountered:

jjsjann123 · 2025-03-07T01:18:46Z

linking comment: #4028 (review)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inplace update done via aliased outputs should have more strict checks #4036

inplace update done via aliased outputs should have more strict checks #4036

jjsjann123 commented Mar 7, 2025

jjsjann123 commented Mar 7, 2025

inplace update done via aliased outputs should have more strict checks #4036

inplace update done via aliased outputs should have more strict checks #4036

Comments

jjsjann123 commented Mar 7, 2025

jjsjann123 commented Mar 7, 2025