🚀 The feature, motivation and pitch
FlashInfer is adding non-causal TRTLLM-gen paged GQA decode(flashinfer-ai/flashinfer#3629). That path selects TRTLLM-gen FMHA generation with PagedKv layout and Dense mask. For page size 16, TensorRT-LLM does not currently ship matching precompiled FMHA cubins or Dense metadata rows.
Requested coverage:
- qkvLayout = PagedKv
- maskType = Dense
- kernelType = Generation
- numTokensPerPage = 16
- headDimQk = headDimV in {64, 128, 256}
- tileSizeQ in {8, 16}
- tileSizeKv = 128
- Blackwell SM100/SM103, matching the existing TRTLLM-gen FMHA cubin set
Why P16 matters:
- numTokensPerPage is selected from the physical paged KV cache layout.
- Changing the downstream test to page size 32 selects a different FMHA metadata key and does not validate the P16 runtime path.
- FlashInfer supports page size 16 paged KV caches in its public decode path. Non-causal GQA should not require changing the cache layout only to match the currently shipped cubin set.
There is also a metadata issue for existing Dense-named P32 generation entries: the function names contain PagedKvDense, but the rows are indexed with maskType=Causal. Please index those rows as Dense when refreshing the metadata.
Alternatives
FlashInfer can keep the new non-causal GQA tests as xfail, but it cannot remove the xfail or validate the Dense runtime path until TensorRT-LLM ships the P16 Dense cubins and metadata.
Changing the downstream tests to page size 32 would only test a different runtime shape and would not validate the P16 path used by the reported case.
Additional context
Before submitting a new issue...
🚀 The feature, motivation and pitch
FlashInfer is adding non-causal TRTLLM-gen paged GQA decode(flashinfer-ai/flashinfer#3629). That path selects TRTLLM-gen FMHA generation with PagedKv layout and Dense mask. For page size 16, TensorRT-LLM does not currently ship matching precompiled FMHA cubins or Dense metadata rows.
Requested coverage:
Why P16 matters:
There is also a metadata issue for existing Dense-named P32 generation entries: the function names contain PagedKvDense, but the rows are indexed with maskType=Causal. Please index those rows as Dense when refreshing the metadata.
Alternatives
FlashInfer can keep the new non-causal GQA tests as xfail, but it cannot remove the xfail or validate the Dense runtime path until TensorRT-LLM ships the P16 Dense cubins and metadata.
Changing the downstream tests to page size 32 would only test a different runtime shape and would not validate the P16 path used by the reported case.
Additional context
Before submitting a new issue...