Skip to content

Implement LPPool #3595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 44 commits into from
Closed

Implement LPPool #3595

wants to merge 44 commits into from

Conversation

hieule88
Copy link
Collaborator

@hieule88 hieule88 commented Mar 11, 2025

  • Added LPPool 1D 2D forward and backward.

  • Added driver test and gtest for LPPool.

  • New API is guarded by MIOPEN_BETA_API macro.

  • Average over all cases:

LPPool1D

Type Forward Backward
float16 2.35 4.12
float32 2.37 3.51
bfloat16 2.40 4.17

LPPool1D FP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool1d float16 [16 672 32] 2 2 1 contiguous fwd 58000 28675 2.022667829
LPPool1d float16 [16 672 32] 2 2 1 contiguous bwd 101184 36888 2.743005856
LPPool1d float16 [16 672 32] 2 2 1 noncontiguous fwd 69536 29955 2.32134869
LPPool1d float16 [16 672 32] 2 2 1 noncontiguous bwd 139455 38150 3.655439056
LPPool1d float16 [16 960 32] 2 2 1 contiguous fwd 62048 37812 1.640960542
LPPool1d float16 [16 960 32] 2 2 1 contiguous bwd 108479 51519 2.105611522
LPPool1d float16 [16 960 32] 2 2 1 noncontiguous fwd 75680 39217 1.929775353
LPPool1d float16 [16 960 32] 2 2 1 noncontiguous bwd 155887 52621 2.962448452
LPPool1d float16 [3 2048 64] 2 2 1 contiguous fwd 58944 32177 1.831867483
LPPool1d float16 [3 2048 64] 2 2 1 contiguous bwd 102911 41528 2.478111154
LPPool1d float16 [3 2048 64] 2 2 1 noncontiguous fwd 73840 33475 2.205825243
LPPool1d float16 [3 2048 64] 2 2 1 noncontiguous bwd 151647 43696 3.470500732
LPPool1d float16 [64 2208 7] 2 2 1 contiguous fwd 75408 62665 1.203351153
LPPool1d float16 [64 2208 7] 2 2 1 contiguous bwd 130208 95785 1.359377773
LPPool1d float16 [64 2208 7] 2 2 1 noncontiguous fwd 108944 66683 1.633759729
LPPool1d float16 [64 2208 7] 2 2 1 noncontiguous bwd 222895 99625 2.237340025

LPPool1D FP32
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool1d float32 [16 672 32] 2 2 1 contiguous fwd 59728 28870 2.068860409
LPPool1d float32 [16 672 32] 2 2 1 contiguous bwd 103023 37386 2.755657198
LPPool1d float32 [16 672 32] 2 2 1 noncontiguous fwd 74128 32195 2.302469328
LPPool1d float32 [16 672 32] 2 2 1 noncontiguous bwd 146720 41013 3.577402287
LPPool1d float32 [16 960 32] 2 2 1 contiguous fwd 66448 37653 1.764746501
LPPool1d float32 [16 960 32] 2 2 1 contiguous bwd 115967 50826 2.281647188
LPPool1d float32 [16 960 32] 2 2 1 noncontiguous fwd 84736 42595 1.989341472
LPPool1d float32 [16 960 32] 2 2 1 noncontiguous bwd 170623 54328 3.140608894
LPPool1d float32 [3 2048 64] 2 2 1 contiguous fwd 61136 32266 1.894749892
LPPool1d float32 [3 2048 64] 2 2 1 contiguous bwd 106160 41368 2.566234771
LPPool1d float32 [3 2048 64] 2 2 1 noncontiguous fwd 77904 33617 2.317398935
LPPool1d float32 [3 2048 64] 2 2 1 noncontiguous bwd 157215 44000 3.573068182
LPPool1d float32 [64 2208 7] 2 2 1 contiguous fwd 87072 62666 1.38946159
LPPool1d float32 [64 2208 7] 2 2 1 contiguous bwd 163584 95608 1.710986528
LPPool1d float32 [64 2208 7] 2 2 1 noncontiguous fwd 119376 69137 1.726658663
LPPool1d float32 [64 2208 7] 2 2 1 noncontiguous bwd 264847 100800 2.627450397

LPPool1D BFP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool1d bfloat16 [16 672 32] 2 2 1 contiguous fwd 60016 28871 2.078764158
LPPool1d bfloat16 [16 672 32] 2 2 1 contiguous bwd 108479 36924 2.937899469
LPPool1d bfloat16 [16 672 32] 2 2 1 noncontiguous fwd 71936 29937 2.402912784
LPPool1d bfloat16 [16 672 32] 2 2 1 noncontiguous bwd 144095 38524 3.740395598
LPPool1d bfloat16 [16 960 32] 2 2 1 contiguous fwd 64448 37902 1.700385204
LPPool1d bfloat16 [16 960 32] 2 2 1 contiguous bwd 114207 51022 2.238387362
LPPool1d bfloat16 [16 960 32] 2 2 1 noncontiguous fwd 77840 39217 1.984853507
LPPool1d bfloat16 [16 960 32] 2 2 1 noncontiguous bwd 159198 52853 3.012090137
LPPool1d bfloat16 [3 2048 64] 2 2 1 contiguous fwd 60816 32302 1.882731719
LPPool1d bfloat16 [3 2048 64] 2 2 1 contiguous bwd 107967 41671 2.590938542
LPPool1d bfloat16 [3 2048 64] 2 2 1 noncontiguous fwd 75615 33635 2.248104653
LPPool1d bfloat16 [3 2048 64] 2 2 1 noncontiguous bwd 157103 43591 3.604023766
LPPool1d bfloat16 [64 2208 7] 2 2 1 contiguous fwd 77359 62826 1.231321427
LPPool1d bfloat16 [64 2208 7] 2 2 1 contiguous bwd 138367 96213 1.438132061
LPPool1d bfloat16 [64 2208 7] 2 2 1 noncontiguous fwd 109663 66755 1.642768332

LPPool 2D

Type Forward Backward
float16 1.25 1.47
float32 1.36 1.68
bfloat16 1.35 1.56

LPPool2D FP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool2d float16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous fwd 210288 136549 1.540018601
LPPool2d float16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous bwd 456046 164619 2.770312054
LPPool2d float16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous fwd 482078 422766 1.140295104
LPPool2d float16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous bwd 2646486 598016 4.425443466
LPPool2d float16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous fwd 880077 578603 1.521037741
LPPool2d float16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous bwd 3723649 1044300 3.565688978
LPPool2d float16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous fwd 840396 730067 1.151121746
LPPool2d float16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous bwd 2259911 1167730 1.935302681

LPPool2D FP32
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool2d float32 [256 256 6 6] 2 [1 1] [1 1] noncontiguous fwd 249567 136692 1.825761566
LPPool2d float32 [256 256 6 6] 2 [1 1] [1 1] noncontiguous bwd 627934 168639 3.723539632
LPPool2d float32 [16 72 64 64] 2 [3 3] [1 1] noncontiguous fwd 806060 646911 1.246013748
LPPool2d float32 [16 72 64 64] 2 [3 3] [1 1] noncontiguous bwd 1934392 1134820 1.704580462
LPPool2d float32 [16 120 64 64] 2 [2 2] [1 1] noncontiguous fwd 1337402 891175 1.500717592
LPPool2d float32 [16 120 64 64] 2 [2 2] [1 1] noncontiguous bwd 3362195 1787430 1.881021914
LPPool2d float32 [16 480 32 32] 2 [4 4] [1 1] noncontiguous fwd 1313339 1198500 1.09581894
LPPool2d float32 [16 480 32 32] 2 [4 4] [1 1] noncontiguous bwd 3545698 3306250 1.072422836

LPPool2D BFP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool2d bfloat16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous fwd 221390 137280 1.612689394
LPPool2d bfloat16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous bwd 490428 166275 2.949499323
LPPool2d bfloat16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous fwd 502860 424123 1.185646617
LPPool2d bfloat16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous bwd 2771849 599962 4.620040936
LPPool2d bfloat16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous fwd 915832 580460 1.577769355
LPPool2d bfloat16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous bwd 3039143 1033949 2.939354842
LPPool2d bfloat16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous fwd 871049 731553 1.190684749
LPPool2d bfloat16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous bwd 3307300 1170540 2.825448084

@BradPepersAMD
Copy link
Collaborator

MIOpen is moving to the new monorepo setup and all older unmerged PR's are being closed. Please re-open this as part of the new repo if these changes are still needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants