Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve simplify_algebra #3633

Open
aarushjain29 opened this issue Nov 19, 2024 · 0 comments
Open

Improve simplify_algebra #3633

aarushjain29 opened this issue Nov 19, 2024 · 0 comments

Comments

@aarushjain29
Copy link
Contributor

aarushjain29 commented Nov 19, 2024

Problem Description

Once the mul_add_kernel, add_kernel and add_kernel are horizontally fused with the mlir_dot_add and mlir_slice_mul_reshape_transpose we can make this case more generic.
For example if in place of mul there's transpose followed by add. In this case we can flip the transpose and the add for the slice section 1. We can horizontally fuse the add for the slices and then transpose everything.

@15 = gpu::code_object[code_object=7632,symbol_name=mlir_dot_add,global=133632,local=256,](@13,@12,@5,@14) -> half_type, {24, 77, 2304}, {177408, 2304, 1}: 0.0934304ms, 2%
@16 = hip::hip_copy_literal[id=main:@literal:78] -> half_type, {768}, {1}: 0.00109522ms, 1%
@17 = hip::hip_copy_literal[id=main:@literal:59] -> half_type, {768}, {1}: 0.00108192ms, 1%
@18 = slice[axes={2},starts={768},ends={1536}](@15) -> half_type, {24, 77, 768}, {177408, 2304, 1}: 0.00165542ms, 1%
@19 = multibroadcast[out_lens={24, 77, 768},out_dyn_dims={}](@17) -> half_type, {24, 77, 768}, {0, 0, 1}: 0.00094074ms, 1%
@20 = load[offset=18184320,end=21022848](@1) -> half_type, {24, 77, 768}, {59136, 768, 1}: 0.00076536ms, 1%
**@21 = gpu::code_object[code_object=5128,symbol_name=add_kernel,global=354816,local=1024,](@19,@18,@20) -> half_type, {24, 77, 768}, {59136, 768, 1}: 0.0211362ms, 1%**
@22 = load[offset=11354112,end=14192640](@1) -> half_type, {24, 77, 768}, {59136, 768, 1}: 0.00099472ms, 1%
@23 = multibroadcast[out_lens={24, 77, 768},out_dyn_dims={}](@16) -> half_type, {24, 77, 768}, {0, 0, 1}: 0.00182424ms, 1%
@24 = slice[axes={2},starts={0},ends={768}](@15) -> half_type, {24, 77, 768}, {177408, 2304, 1}: 0.00103286ms, 1%
**@25 = gpu::code_object[code_object=5136,symbol_name=mul_add_kernel,global=354816,local=1024,](@24,@23,@22) -> half_type, {24, 77, 768}, {59136, 768, 1}: 0.0413997ms, 1%**
@26 = load[offset=14769216,end=18184320](@1) -> half_type, {24, 12, 77, 77}, {71148, 5929, 77, 1}: 0.00105ms, 1%
@27 = gpu::code_object[code_object=6736,symbol_name=mlir_reshape_transpose_reshape_transpose_dot,global=73728,local=256,](@25,@21,@26) -> half_type, {24, 12, 77, 77}, {71148, 5929, 77, 1}: 0.0248955ms, 1%
...
@32 = load[offset=14769216,end=17607744](@1) -> half_type, {24, 77, 768}, {59136, 768, 1}
@33 = multibroadcast[out_lens={24, 77, 768},out_dyn_dims={}](@31) -> half_type, {24, 77, 768}, {0, 0, 1}
@34 = slice[axes={2},starts={1536},ends={2304}](@14) -> half_type, {24, 77, 768}, {177408, 2304, 1}
**@35 = gpu::code_object[code_object=5128,symbol_name=add_kernel,global=354816,local=1024,](@33,@34,@32) -> half_type, {24, 77, 768}, {59136, 768, 1}**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant