You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I'm currently trying to rewrite my pipeline with TE. I use merged sequences for LM and as far as I know I should use "thd" format for it.
I see that MultiheadAttention class (from here) doesn't support this format (as there is no mention of tithed in args annotation). But DotProductAttention seems to be support "thb".
When I pass qkv_format = "thd" in transformer layer it looks like the only reason why it doesn't work is that in MultiheadAttention we need to pass cu_seqlens to DotProductAttention. Am I correct about it? Thanks.
The text was updated successfully, but these errors were encountered:
Hello!
I'm currently trying to rewrite my pipeline with TE. I use merged sequences for LM and as far as I know I should use "thd" format for it.
I see that MultiheadAttention class (from here) doesn't support this format (as there is no mention of tithed in args annotation). But DotProductAttention seems to be support "thb".
When I pass qkv_format = "thd" in transformer layer it looks like the only reason why it doesn't work is that in MultiheadAttention we need to pass cu_seqlens to DotProductAttention. Am I correct about it? Thanks.
The text was updated successfully, but these errors were encountered: