Skip to content

Implement Pad Reflection 1D #3630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: develop
Choose a base branch
from
Open

Implement Pad Reflection 1D #3630

wants to merge 44 commits into from

Conversation

anhskrttt
Copy link
Collaborator

@anhskrttt anhskrttt commented Mar 17, 2025

  • Add Pad Reflection 1D operation [ref] for forward and backward.
  • Add driver and gtest.
  • Performance condition:
    • Not type bfloat16
    • MIOpen performs better if it's padding1D (i.e. input_num_dims == 2) and pad only to last dim (i.e. padding_array.size() == 2)
    • For PadReflectionBackward: Not type float16
      • float32: MIOpen performs better if padding to dim that is not too large (padded_dim <= 64).

Average improvement over ROCm

type fwd bwd
float 1.48 1.73
float16 1.40 -
bfloat16 - -

Detail Benchmark

fp32_fwd
dtype input_size contiguous padding direction improvement
float32 [1024 256 8] noncontiguous [3 5] fwd 5.5431
float32 [1024 64 8] noncontiguous [3 5] fwd 4.3195
float32 [256 256 8] noncontiguous [3 5] fwd 4.3056
float32 [1024 256 16] noncontiguous [3 5] fwd 4.0766
float32 [1024 64 16] noncontiguous [3 5] fwd 3.5369
float32 [256 256 16] noncontiguous [3 5] fwd 3.3904
float32 [8 2 64] noncontiguous [2 2] fwd 3.1042
float32 [16 2 64] noncontiguous [2 2] fwd 2.7072
float32 [16 8 16] noncontiguous [2 2] fwd 2.5312
float32 [16 4 64] noncontiguous [3 5] fwd 2.4136
float32 [1024 16 8] noncontiguous [3 5] fwd 2.3902
float32 [32 8 16] noncontiguous [2 2] fwd 2.3676
float32 [8 8 16] noncontiguous [2 2] fwd 2.3544
float32 [16 4 16] noncontiguous [2 2] fwd 2.3374
float32 [32 2 16] noncontiguous [2 2] fwd 2.3363
fp16_fwd
dtype input_size contiguous padding direction improvement
float16 [512 256 8] noncontiguous [3 5] fwd 5.042308539
float16 [512 256 16] noncontiguous [3 5] fwd 3.793880455
float16 [512 64 8] noncontiguous [3 5] fwd 3.229970638
float16 [128 256 8] noncontiguous [3 5] fwd 3.061361755
float16 [512 64 16] noncontiguous [3 5] fwd 2.99377916
float16 [16 4 16] noncontiguous [2 2] fwd 2.407523511
float16 [4 2 16] noncontiguous [2 2] fwd 2.333333333
float16 [32 2 16] noncontiguous [3 5] fwd 2.242424242
float16 [16 2 16] noncontiguous [2 2] fwd 2.07073955
float16 [8 4 16] noncontiguous [2 2] fwd 2.067524116
float16 [32 16 16] noncontiguous [3 5] fwd 1.971631206
float16 [512 16 16] noncontiguous [3 5] fwd 1.854805726
float16 [512 16 8] noncontiguous [3 5] fwd 1.847280335
float16 [128 64 16] noncontiguous [3 5] fwd 1.825203252
float16 [128 16 8] noncontiguous [3 5] fwd 1.816377171
fp32_bwd
dtype input_size contiguous padding direction improvement
float32 [1024 256 8] noncontiguous [3 5] bwd 4.031944207
float32 [512 256 8] contiguous [3 5] bwd 3.871722166
float32 [512 256 16] contiguous [3 5] bwd 3.489330757
float32 [1024 256 16] noncontiguous [3 5] bwd 3.31064466
float32 [256 256 8] noncontiguous [3 5] bwd 2.98488121
float32 [128 256 8] contiguous [3 5] bwd 2.818461538
float32 [512 64 8] contiguous [3 5] bwd 2.809829603
float32 [512 64 16] contiguous [3 5] bwd 2.736646341
float32 [128 256 16] contiguous [3 5] bwd 2.733203505
float32 [256 256 16] noncontiguous [3 5] bwd 2.580315959
float32 [1024 64 8] noncontiguous [3 5] bwd 2.494184734
float32 [1024 64 16] noncontiguous [3 5] bwd 2.330557868
float32 [128 64 8] contiguous [3 5] bwd 1.495726496
float32 [512 16 8] contiguous [3 5] bwd 1.492063492
float32 [32 256 8] contiguous [3 5] bwd 1.475352113

DuongQLee and others added 30 commits April 14, 2024 17:58
@anhskrttt anhskrttt marked this pull request as draft March 17, 2025 03:34
@anhskrttt anhskrttt self-assigned this Mar 17, 2025
@anhskrttt anhskrttt marked this pull request as ready for review March 18, 2025 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants