Add index select backward #359

AdvancedCompiler · 2024-12-13T01:31:50Z

PR Category

Operator

Type of Change

New Feature

Description

Implement index_select_backward operator

Issue

#316

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

Accuracy

iclementine · 2024-12-13T08:32:25Z

src/flag_gems/ops/index_select.py

+    dim = dim % len(self_sizes)
+    grad_shape = list(grad.shape)
+    assert grad_shape[dim] == index_shape[0], "Index out of range"
+    grad = dim_compress(grad, dim)


Function dim_compress move the specified dim to the inner-most dimension and make a contiguous tensor, which is designed to be used in reduction/scan operator when other dimensions are considered batch dimensions and the reduction operation is performed on each 1d sub tensors, which will be loaded and iterated over. Thus being contiguous is good for this case.

For the backward of index_select, it is actually the opposite. Dimension dim is used as an indexing dimension. Then several n-1d subtensors are inserted into a zeros tensor. So, being contiguous on dimension dim is not what we want. We rather want it to be the outer-most dimension.

Ok,we will change it.

Hello, there is a precision issue when calling atomic_add, and I'm not sure how to resolve it.

Bowen12992

LGTM for CI coverage

iclementine · 2024-12-19T07:47:10Z

src/flag_gems/ops/index_select.py

+    batch_dim = [i for i in range(dim) if i not in dims]
+    sorted_reduction_dim = sorted(dims, key=lambda x: stride[x], reverse=True)
+    order = sorted_reduction_dim + batch_dim
+    return inp.permute(order).contiguous()


looks good.

iclementine · 2024-12-19T07:50:42Z

src/flag_gems/ops/index_select.py

+    for i in index:
+        assert i >= 0 and i < self_sizes[dim], "Index out of range"


I suggest removing this out-of-bound checking, since it involves slicing & compare for each index, which means a large overhead.

iclementine · 2024-12-19T09:03:15Z

tests/test_reduction_ops.py

+
+    index_size = inp.size(dim)
+    index = torch.randint(0, index_size, [floor(index_size * 0.8)], device="cuda")
+    index = torch.unique(index)


What if there is duplicated index? I don't think current implementation can handle this.

Yes, for duplicted index , torch.autograd.grad can handle this case, our design code also can handle. But for accuracy test, out_grad and index random generate ,not meet same index and value keep correspond's condition. So, we add the unique statement.；

index

out_grad

normally ,the index is duplicated,the correwponding value overwritten, and the result is correct.

No, I don't see how it can handle duplicated indices. With duplicated indices, a sub-tensor is selected multiple times in the forward op, thus the corresponding gradient should accumulate into the corresponding input gradient.

There is no such accumulation in the kernel. I add a test, and it always fails.

@pytest.mark.index_select_backward @pytest.mark.parametrize("shape", REDUCTION_SHAPES) @pytest.mark.parametrize("dim", DIM_LIST) @pytest.mark.parametrize("dtype", FLOAT_DTYPES) def test_accuracy_index_select_backward(shape, dim, dtype): inp = torch.randn(shape, dtype=dtype, device="cuda", requires_grad=True) ref_inp = to_reference(inp) from math import floor index_size = inp.size(dim) index = torch.tensor([0, 0, 0, 0], device="cuda") # index = torch.unique(index) if len(index) == 0: pass else: ref_index = to_reference(index) ref_out = torch.index_select(ref_inp, dim, ref_index) out_grad = torch.randn_like(ref_out) ref_grad = to_reference(out_grad) (ref_in_grad,) = torch.autograd.grad(ref_out, ref_inp, ref_grad) with flag_gems.use_gems(): res_out = torch.index_select(inp, dim, index) (res_in_grad,) = torch.autograd.grad(res_out, inp, out_grad) res_out = to_reference(res_out) res_in_grad = to_reference(res_in_grad) gems_assert_equal(res_out, ref_out) gems_assert_equal(res_in_grad, ref_in_grad)

iclementine · 2024-12-19T09:23:47Z

src/flag_gems/ops/index_select.py

+    tem_shape = list(grad.shape[1:])
+    tem_shape[-1] = 1


What does this shape tmp_shape intended to be ?

Maybe choose a more semantically meaningful name.

iclementine · 2024-12-24T07:24:57Z

src/flag_gems/ops/index_select.py

+        grad_off = (pid_x * num_blocks_per_CTA + i) * N + cols_offsets
+        out_off = (indices * num_blocks_per_CTA + i) * N + cols_offsets
+        selected = tl.load(grad + grad_off, mask=grad_mask, other=0.0)
+        tl.atomic_add(out + out_off, selected, mask=grad_mask)


Since this kernel uses atomic_add and also autotune, you should add out to reset_to_zero to avoid it being added to many times.

iclementine · 2024-12-30T07:42:07Z

benchmark/test_select_and_slice_perf.py

+        )
+        yield inp, dim, index
+
+    bench = TensorSelectBenchmark(


You need to override

def get_gbps(self, args, latency=None): # """Return the dynamic input iterator for each Operator.""" raise NotImplementedError( "Each Benchmark must implement its own input iterator." )

for it to compute gbps metric.

ph0375 and others added 16 commits December 11, 2024 09:22

init-flaggems

51e16fd

init-ops

f5dd979

index_select.py

0afb6bc

Merge branch 'FlagOpen:master' into add_index_select_backward

00acd52

update

d347788

init_update

f47272b

Merge branch 'FlagOpen:master' into add_index_select_backward

0caa0a8

Update __init__.py

414fe76

autotune

3aa39b3

Update tune_configs.yaml

4b6ad02

index_select_backward unit test

fa976f0

Merge branch 'FlagOpen:master' into add_index_select_backward

a2e6e01

Update test_select_and_slice_perf.py

3ef71f4

Merge branch 'FlagOpen:master' into add_index_select_backward

1ac8878

change autotune config

1376320

code format change

b685f3c

iclementine self-assigned this Dec 13, 2024

iclementine reviewed Dec 13, 2024

View reviewed changes

AdvancedCompiler and others added 8 commits December 16, 2024 09:51

Merge branch 'FlagOpen:master' into add_index_select_backward

6dfb8b7

Merge branch 'FlagOpen:master' into add_index_select_backward

378cecf

Merge branch 'FlagOpen:master' into add_index_select_backward

084e409

add_index_select_backward

acb1e59

Merge branch 'FlagOpen:master' into add_index_select_backward

1f94774

add_index_select_backward

a19634d

add_index_select_backward

80777b3

Merge branch 'FlagOpen:master' into add_index_select_backward

d8b1441

Bowen12992 previously approved these changes Dec 19, 2024

View reviewed changes

iclementine reviewed Dec 19, 2024

View reviewed changes

solve conflict

3d59b87

AdvancedCompiler dismissed Bowen12992’s stale review via 3d59b87 December 19, 2024 08:00

iclementine reviewed Dec 19, 2024

View reviewed changes

henghengxiedaima added 3 commits December 20, 2024 11:28

add_index_select_backward

3a27f33

add_index_select_backward

c3fd06f

add_index_select_backward

3c64858

AdvancedCompiler requested a review from iclementine December 23, 2024 08:51

iclementine reviewed Dec 24, 2024

View reviewed changes

henghengxiedaima and others added 3 commits December 24, 2024 17:01

add_index_select

7770b1a

Merge branch 'FlagOpen:master' into add_index_select_backward

bafa7ef

add_index_select_backward

6515c14

iclementine reviewed Dec 30, 2024

View reviewed changes

add_index_select_backward

815360e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index select backward #359

Add index select backward #359

AdvancedCompiler commented Dec 13, 2024

iclementine Dec 13, 2024

shuailong616 Dec 16, 2024

shuailong616 Dec 23, 2024

Bowen12992 left a comment

iclementine Dec 19, 2024

iclementine Dec 19, 2024

shuailong616 Dec 20, 2024

iclementine Dec 19, 2024

shuailong616 Dec 20, 2024

iclementine Dec 20, 2024

AdvancedCompiler Dec 20, 2024

iclementine Dec 19, 2024

iclementine Dec 19, 2024

AdvancedCompiler Dec 20, 2024

iclementine Dec 24, 2024

iclementine Dec 30, 2024

AdvancedCompiler Dec 30, 2024

		for i in index:
		assert i >= 0 and i < self_sizes[dim], "Index out of range"

Add index select backward #359

Are you sure you want to change the base?

Add index select backward #359

Conversation

AdvancedCompiler commented Dec 13, 2024

PR Category

Type of Change

Description

Issue

Progress

Performance

Accuracy

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bowen12992 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment