[None][perf] enable paged context for skip-softmax fp8 kv by bobboli · Pull Request #15442 · NVIDIA/TensorRT-LLM

bobboli · 2026-06-17T03:51:44Z

Summary

Force paged-context FMHA when skip-softmax attention is used with FP8 KV cache.
This keeps context QKV on the paged-context preprocessing path so FP8 KV-cache skip-softmax dispatch does not fall back to BF16 packed context.

pytest/runtime verification in tekit4. Local Python is missing pluggy/torch, and I did not build TensorRT-LLM in this checkout.
Full-file ruff on trtllm.py reports pre-existing style violations unrelated to this change.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

github-actions Bot assigned bobboli Jun 17, 2026

[None][perf] enable paged context for skip-softmax fp8 kv

b93a812

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli force-pushed the lbo/skip-softmax-fp8-paged-context branch from 2d81eae to b93a812 Compare June 17, 2026 08:41