add argmin (#318) #346

wyjoutstanding · 2024-12-07T13:02:49Z

PR Category

Operator, OP Test, Benchmark

Type of Change

New Feature

Description

Support argmin, detail see #318

Issue

Resolves #318

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

A100 test result:

Operator: argmin Performance Test (dtype=torch.float16, mode=cuda, level=comprehensive)
Size Torch Latency (ms) Gems Latency (ms) Gems Speedup Size Detail

SUCCESS 0.019456 0.013312 1.462 [torch.Size([1048576])]
SUCCESS 0.010240 0.010240 1.000 [torch.Size([64, 64])]
SUCCESS 0.055296 0.043008 1.286 [torch.Size([4096, 4096])]
SUCCESS 0.055296 0.043008 1.286 [torch.Size([64, 512, 512])]
SUCCESS 1.888256 1.249280 1.511 [torch.Size([1024, 1024, 1024])]
SUCCESS 0.007168 0.009216 0.778 [torch.Size([4])]
SUCCESS 0.008192 0.009216 0.889 [torch.Size([1024])]
SUCCESS 1.886208 1.249280 1.510 [torch.Size([1073741824])]
SUCCESS 0.008192 0.009216 0.889 [torch.Size([1024, 1])]
SUCCESS 0.014336 0.010240 1.400 [torch.Size([1024, 16])]
SUCCESS 0.017408 0.010240 1.700 [torch.Size([1024, 256])]
SUCCESS 0.026624 0.018432 1.444 [torch.Size([1024, 4096])]
SUCCESS 0.158720 0.106496 1.490 [torch.Size([1024, 65536])]
SUCCESS 0.009216 0.010240 0.900 [torch.Size([64, 64, 1])]
SUCCESS 0.032768 0.010240 3.200 [torch.Size([64, 64, 16])]
SUCCESS 0.019456 0.012288 1.583 [torch.Size([64, 64, 256])]
SUCCESS 0.055296 0.043008 1.286 [torch.Size([64, 64, 4096])]

Operator: argmin Performance Test (dtype=torch.float32, mode=cuda, level=comprehensive)
Size Torch Latency (ms) Gems Latency (ms) Gems Speedup Size Detail

SUCCESS 0.020480 0.014336 1.429 [torch.Size([1048576])]
SUCCESS 0.010240 0.009216 1.111 [torch.Size([64, 64])]
SUCCESS 0.074752 0.066560 1.123 [torch.Size([4096, 4096])]
SUCCESS 0.074752 0.066560 1.123 [torch.Size([64, 512, 512])]
SUCCESS 2.638848 2.404352 1.098 [torch.Size([1024, 1024, 1024])]
SUCCESS 0.007168 0.009216 0.778 [torch.Size([4])]
SUCCESS 0.008192 0.009216 0.889 [torch.Size([1024])]
SUCCESS 2.638848 2.402304 1.098 [torch.Size([1073741824])]
SUCCESS 0.008192 0.009216 0.889 [torch.Size([1024, 1])]
SUCCESS 0.015360 0.010240 1.500 [torch.Size([1024, 16])]
SUCCESS 0.017408 0.011264 1.545 [torch.Size([1024, 256])]
SUCCESS 0.033792 0.027648 1.222 [torch.Size([1024, 4096])]
SUCCESS 0.214016 0.178176 1.201 [torch.Size([1024, 65536])]
SUCCESS 0.009216 0.009216 1.000 [torch.Size([64, 64, 1])]
SUCCESS 0.033792 0.010240 3.300 [torch.Size([64, 64, 16])]
SUCCESS 0.019456 0.014336 1.357 [torch.Size([64, 64, 256])]
SUCCESS 0.074752 0.066560 1.123 [torch.Size([64, 64, 4096])]

Operator: argmin Performance Test (dtype=torch.bfloat16, mode=cuda, level=comprehensive)
Size Torch Latency (ms) Gems Latency (ms) Gems Speedup Size Detail

SUCCESS 0.019456 0.013312 1.462 [torch.Size([1048576])]
SUCCESS 0.010240 0.010240 1.000 [torch.Size([64, 64])]
SUCCESS 0.056320 0.044032 1.279 [torch.Size([4096, 4096])]
SUCCESS 0.056320 0.044032 1.279 [torch.Size([64, 512, 512])]
SUCCESS 1.954816 1.280000 1.527 [torch.Size([1024, 1024, 1024])]
SUCCESS 0.007168 0.009216 0.778 [torch.Size([4])]
SUCCESS 0.008192 0.009216 0.889 [torch.Size([1024])]
SUCCESS 1.972224 1.277952 1.543 [torch.Size([1073741824])]
SUCCESS 0.008192 0.009216 0.889 [torch.Size([1024, 1])]
SUCCESS 0.014336 0.010240 1.400 [torch.Size([1024, 16])]
SUCCESS 0.017408 0.010240 1.700 [torch.Size([1024, 256])]
SUCCESS 0.026624 0.019456 1.368 [torch.Size([1024, 4096])]
SUCCESS 0.162816 0.106496 1.529 [torch.Size([1024, 65536])]
SUCCESS 0.009216 0.010240 0.900 [torch.Size([64, 64, 1])]
SUCCESS 0.033792 0.010240 3.300 [torch.Size([64, 64, 16])]
SUCCESS 0.019456 0.012288 1.583 [torch.Size([64, 64, 256])]
SUCCESS 0.056320 0.044032 1.279 [torch.Size([64, 64, 4096])]

iclementine · 2024-12-10T02:06:10Z

src/flag_gems/ops/argmin.py

+import triton.language as tl
+
+from ..utils import libentry
+from ..utils.shape_utils import can_use_int32_index


Please follow the new changes in argmax and use triton_lang_extension since we made a decision to use int64 indexing everywhere to prevent unexpected overflow.

We used to use a more conservative way to do so by computing the maximum element offset of a tensor but now we decide to make it easier.

See also #327

tests/test_reduction_ops.py

iclementine

LGTM

src/flag_gems/ops/argmin.py

iclementine

Please add a test case where dim is None.

wyjoutstanding · 2024-12-27T14:59:07Z

Please add a test case where dim is None.

wyjoutstanding · 2024-12-27T15:00:45Z

@iclementine rebase了，帮忙review下

iclementine · 2024-12-28T06:02:19Z

src/flag_gems/ops/argmin.py

+from ..utils import libentry
+from ..utils import triton_lang_extension as tle
+
+torch_dtype_to_tl_dtype_and_max_value = {


这里格式有一些不符合格式化工具检查的结果。冒号前不用加空格，后面加即可。
请使用 pre-commit 工具处理一下。

pip install pre-commit 然后在工程目录 pre-commit install.

iclementine self-assigned this Dec 9, 2024

iclementine reviewed Dec 10, 2024

View reviewed changes

tests/test_reduction_ops.py Outdated Show resolved Hide resolved

wyjoutstanding force-pushed the argmin_dev branch from e4dfbed to 03956fc Compare December 15, 2024 15:37

iclementine approved these changes Dec 18, 2024

View reviewed changes

iclementine reviewed Dec 18, 2024

View reviewed changes

src/flag_gems/ops/argmin.py Show resolved Hide resolved

iclementine requested changes Dec 18, 2024

View reviewed changes

wuyangjun added 3 commits December 27, 2024 22:57

add argmin

67a11ce

support int dtype and int64 index

7643d5d

add test for dim=None

960e189

wyjoutstanding force-pushed the argmin_dev branch from 9a105d0 to 960e189 Compare December 27, 2024 14:57

wyjoutstanding closed this Dec 27, 2024

wyjoutstanding changed the title ~~add argmin(#318)~~ add argmin (#318) Dec 27, 2024

wyjoutstanding reopened this Dec 27, 2024

iclementine reviewed Dec 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add argmin (#318) #346

add argmin (#318) #346

wyjoutstanding commented Dec 7, 2024 •

edited

Loading

iclementine Dec 10, 2024 •

edited

Loading

wyjoutstanding Dec 15, 2024

iclementine left a comment

iclementine left a comment

wyjoutstanding commented Dec 27, 2024

wyjoutstanding commented Dec 27, 2024

iclementine Dec 28, 2024

add argmin (#318) #346

Are you sure you want to change the base?

add argmin (#318) #346

Conversation

wyjoutstanding commented Dec 7, 2024 • edited Loading

PR Category

Type of Change

Description

Issue

Progress

Performance

iclementine Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

wyjoutstanding Dec 15, 2024

Choose a reason for hiding this comment

iclementine left a comment

Choose a reason for hiding this comment

iclementine left a comment

Choose a reason for hiding this comment

wyjoutstanding commented Dec 27, 2024

wyjoutstanding commented Dec 27, 2024

iclementine Dec 28, 2024

Choose a reason for hiding this comment

wyjoutstanding commented Dec 7, 2024 •

edited

Loading

iclementine Dec 10, 2024 •

edited

Loading