Skip to content

Commit

Permalink
Prefix Caching- fix t4 triton error (#2517)
Browse files Browse the repository at this point in the history
  • Loading branch information
caoshiyi authored Feb 16, 2024
1 parent 5255d99 commit 64da65b
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion vllm/model_executor/layers/triton_kernel/prefix_prefill.py
Original file line number Diff line number Diff line change
Expand Up @@ -618,7 +618,9 @@ def context_attention_fwd(q,
b_ctx_len,
max_input_len,
alibi_slopes=None):
BLOCK = 128

cap = torch.cuda.get_device_capability()
BLOCK = 128 if cap[0] >= 8 else 64
# shape constraints
Lq, Lk, Lv = q.shape[-1], k.shape[-1], v.shape[-1]
assert Lq == Lk and Lk == Lv
Expand Down

0 comments on commit 64da65b

Please sign in to comment.