Hunt correctness for extend attention + RPE #504

nicolasvasilache · 2025-02-14T06:29:37Z

Avoid flex_attention for RPE as it is unclear there is a correct implementation possible given the
limitations of the create_block_mask re conditionals.

Instead we use the manual torch implementation that is known to be correct.

As we update the test and the extend_attention_rpe to use a static max_rpe_context_length,
a new error appears that suggests some issue with the indexing in extend_attention_rpe.

Repro:

pytest tests/kernel/wave/attention/extend_attention_test.py --run-e2e -v -k "rpe"

Errors out with:

Diagnostics:
  <stdin>:282:18: error: 'vector.gather' op operand #2 must be vector of integer or index values, but got 'index'
  %468 = "vector.gather"(%109, %39, %39, %467, %44) : (memref<?xf32, strided<[1], offset: ?>>, index, index, vector<4xi1>, vector<4xf32>) -> vector<4xf32>
                                ^

Signed-off-by: Nicolas Vasilache [email protected]

Signed-off-by: Stanley Winata <[email protected]>

Avoid flex_attention for RPE as it is unclear there is a correct implementation possible given the limitations of the create_block_mask re conditionals. Instead we use the manual torch implementation that is known to be correct. As we update the test and the extend_attention_rpe to use a static max_rpe_context_length, a new error appears that suggests some issue with the indexing in extend_attention_rpe. Repro: ``` pytest tests/kernel/wave/attention/extend_attention_test.py --run-e2e -v -k "rpe" ``` Errors out with: ``` E Diagnostics: E <stdin>:282:18: error: 'vector.gather' op operand iree-org#2 must be vector of integer or index values, but got 'index' E %468 = "vector.gather"(%109, %39, %39, %467, %44) : (memref<?xf32, strided<[1], offset: ?>>, index, index, vector<4xi1>, vector<4xf32>) -> vector<4xf32> E ^ ``` Signed-off-by: Nicolas Vasilache <[email protected]>

Signed-off-by: Alex Zinenko <[email protected]>

This reverts commit 1a12817.

raikonenfnu and others added 12 commits February 12, 2025 23:11

[Wave] Refactor style and fix test for RPE

e6ef6d7

Signed-off-by: Stanley Winata <[email protected]>

[Wave] Add RPE to Extend Attention + Fix shapes for vanilla RPE

d1168c9

Signed-off-by: Stanley Winata <[email protected]>

Thread max_rpe_context_lengths were appropriate

8fae0d4

Signed-off-by: Alex Zinenko <[email protected]>

thread more

d6d5577

debug rpe

89d01e8

Add debug_rpe to dump out the RPE mask from the kernel

4757d8f

Minimize and dump IR

259edc0

debug step

116e053

Allow running without pytest

1a12817

Debug print in wave_ops

c217848

Revert "Allow running without pytest"

c08692a

This reverts commit 1a12817.

nicolasvasilache force-pushed the users/stan/extendRPE branch from 2b97ac1 to c08692a Compare February 14, 2025 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hunt correctness for extend attention + RPE #504

Hunt correctness for extend attention + RPE #504

nicolasvasilache commented Feb 14, 2025

Hunt correctness for extend attention + RPE #504

Are you sure you want to change the base?

Hunt correctness for extend attention + RPE #504

Conversation

nicolasvasilache commented Feb 14, 2025