Block-wise Scatter-Reduce : __getitem__ / __setitem__ errors #5639
Unanswered
olivier-peltre
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
While trying to implement a scatter-add reduction inside a kernel, I get errors related to
tensor.__{get,set}item__
.As far as I understand, indexing features in triton are either limited or new and I'd like some advice on how to find a satisfactory workaround 🙂
Here is a sample code and descriptions of errors:
when trying to access indices I get e.g. line
offset = idx[0]
I get:is this intended?
I can obviously replace this by
tl.load(idx_ptr + j (+k))
as workaroundI further get upon calling setitem:
Overlooking the source, I guess this is because
tensor
doesn't have a__setitem__
method, though throwingAttributeError
would be more informative.I assume the following
would fail for the same reason, though I could probably work around this with
tl.where
. I'm also unsure whether the shared memory (?) accumulatorout
shouldn't be transposed for performance befor the finalstore
instruction.But more generally, what would you recommend? Maybe this has been done somewhere else?
Any form of advice greatly appreciated 🙏
I'm tagging you here @apgoucher and @Mogball, as I wonder whether your recent #5262 PR could help on this subject.
Note
I'm looking into whether triton could help scale-up scatter-add kernels, after having noticed large runtime discrepancies between jax and torch implementations (torch ~4x faster) though both scale bad to large input sizes. I'm hoping that aligning loads to 128B sector sizes while putting more work on each warp (as opposed to a naive implementation relying solely on atomics) could improve the scaling in the large input size limit.
I'm also wondering whether leveraging cuda
__shfl_sync
semantics wouldn't be necessary to get satisfactory results on this (which would probably fall out of triton's scope). Though I'm still looking for a good triton implementation first!Beta Was this translation helpful? Give feedback.
All reactions