Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect optimization pattern in BroadcastElimination #440

Open
navahgar opened this issue Dec 11, 2024 · 1 comment
Open

Incorrect optimization pattern in BroadcastElimination #440

navahgar opened this issue Dec 11, 2024 · 1 comment
Labels
mlir-tensorrt Pull request for the mlir-tensorrt project

Comments

@navahgar
Copy link

The BroadcastElimination pass has a pattern to push down Broadcast ops (code). That in turn exchanges Collapse and Broadcast ops, which seems to be incorrect.

Specifically, if the Collapse op is collapsing more than 1 dim, the code does not seem to handle that correctly. The focus is only on the first collapse dim as shown here. But the final result of the Collapse op is used for the new Broadcast op here.

For example:

module {
  func.func @foobar() -> tensor<96x512x10x10xf32> {
    %cst_f32 = tensorrt.constant dense<1.000000e+00> : tensor<1x96x1x512x1x1xf32>
    %0 = tensorrt.broadcast %cst_f32 broadcast_dims<0, 1, 2, 3, 4, 5> : tensor<1x96x1x512x1x1xf32> to tensor<1x96x1x512x10x10xf32>
    %1 = tensorrt.collapse_rank %0 : tensor<1x96x1x512x10x10xf32> to tensor<96x512x10x10xf32>
    return %1 : tensor<96x512x10x10xf32>
  }
}

When you run --tensorrt-broadcast-elimination on the above code, it results in the following error:

error: 'tensorrt.broadcast' op expected input shape dimension 2 (512) to be broadcastable to result type dimension 2 (10)
    %379 = tensorrt.collapse_rank %378 : tensor<1x96x1x512x10x10xf32> to tensor<96x512x10x10xf32>
           ^
/home/raghavan.raman/scratch/gen2/traffic_lights/test.01.bcast_bug_repro:5:12: note: see current operation: %1 = "tensorrt.broadcast"(%0) <{broadcast_dims = array<i64: 0, 1, 2, 3>}> : (tensor<96x1x512x1xf32>) -> tensor<96x512x10x10xf32>
@navahgar
Copy link
Author

It is not clear if the exchangeCollapseRankAndBroadcast is meant to handle all kinds of Collapse ops. If it is only meant to handle the case when there is a single dim collapse, we could return early, which fixes the problem I pointed out above.

@pranavm-nvidia pranavm-nvidia added the mlir-tensorrt Pull request for the mlir-tensorrt project label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir-tensorrt Pull request for the mlir-tensorrt project
Projects
None yet
Development

No branches or pull requests

2 participants