Question about flashdecoding with appendKV #1325

DD-DuDa · 2024-11-10T08:47:44Z

Hi,
I am curious about why n_blocks_per_split is calculated using params.seqlen_k instead of actual_seqlen_k in the following code:

flash-attention/csrc/flash_attn/src/flash_fwd_kernel.h

Line 525 in b443207

    
           const int n_blocks_per_split = ((params.seqlen_k + kBlockN - 1) / kBlockN + num_n_splits - 1) / num_n_splits;

It seems to be wrong in some cases.

Considering:
seqlen_k = 1024;
seqlen_k_new = 1;
BlockN = 128;
num_split = 4;

the n_blocks_per_split would be equal to 2. And then n_block_max can only reach a maximum of 8 ((3 + 1) * 2) according to:

flash-attention/csrc/flash_attn/src/flash_fwd_kernel.h

Line 529 in b443207

    
           int n_block_max = std::min(cute::ceil_div(binfo.actual_seqlen_k, kBlockN), (n_split_idx + 1) * n_blocks_per_split);

If we attempt to append KV, n_block_copy_min is also equal to 8, which means there is no condition that allows gKNew to append to gK:

flash-attention/csrc/flash_attn/src/flash_fwd_kernel.h

Line 727 in b443207

    
           const int n_block_copy_min = std::max(n_block_min, binfo.seqlen_k_cache / kBlockN);

flash-attention/csrc/flash_attn/src/flash_fwd_kernel.h

Line 730 in b443207

for (int n_block = n_block_max - 1; n_block >= n_block_copy_min; n_block--) {

Am I missing something here?

The text was updated successfully, but these errors were encountered:

SimpleTheoryOfTypes · 2024-11-20T07:20:12Z

The code is correct: seqlen_k is the total kv cache length. i think in your case, binfo.actual_seqlen_k should be strictly less than seqlen_k if there are new tokens to be appended.

DD-DuDa mentioned this issue Nov 24, 2024

Flashdecoding with appendKV might incorrect #1354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about flashdecoding with appendKV #1325

Question about flashdecoding with appendKV #1325

DD-DuDa commented Nov 10, 2024 •

edited

Loading

SimpleTheoryOfTypes commented Nov 20, 2024 •

edited

Loading

Question about flashdecoding with appendKV #1325

Question about flashdecoding with appendKV #1325

Comments

DD-DuDa commented Nov 10, 2024 • edited Loading

SimpleTheoryOfTypes commented Nov 20, 2024 • edited Loading

DD-DuDa commented Nov 10, 2024 •

edited

Loading

SimpleTheoryOfTypes commented Nov 20, 2024 •

edited

Loading