[Bug] `no kernel image is available for execution on the device` when using sage_hub attention backend on RTX 5090 (Blackwell, sm_120)

### Describe the bug

When enabling the experimental sage_hub attention backend on RTX 5090 (Blackwell architecture, compute capability 12.0) with PyTorch 2.8 + CUDA 12.9, inference fails with CUDA kernel compatibility error:
```bash
Error no kernel image is available for execution on the device at line 73 in file /src/csrc/ops.cu
```


### Reproduction

```python
import torch
from diffusers.pipelines.flux2.pipeline_flux2 import Flux2Pipeline
import numpy as np
pipe = Flux2Pipeline.from_pretrained("models/FLUX.2-dev-bnb-4bit", text_encoder=None, torch_dtype=torch.bfloat16).to("cuda:0")
pipe.transformer.set_attention_backend("sage_hub"). # <- Bug here. When I commented out that line, it worked fine.
# pipe.load_lora_weights("models/flux-dev-inpaint/pytorch_lora_weights.safetensors")

def create_random_pil(size=(512, 512)):
     arr = np.random.randint(0, 255, (size[1], size[0], 3), dtype=np.uint8)
     return Image.fromarray(arr)

coarse_pil = create_random_pil((512, 512))
garment_pil = create_random_pil((512, 512))
prompt_embeds = torch.randn(1, 256, 15360, dtype=torch.bfloat16, device="cuda:0")

images = pipe(
     image=[coarse_pil, garment_pil],
     prompt_embeds=prompt_embeds,
     height=512,
     width=512,
     guidance_scale=7.5,
     num_inference_steps=30,
     generator=torch.Generator("cpu").manual_seed(42),
 )
```

### Logs

```shell
... ...(Initialize output information)
0%|                                                                                                                                                                   | 0/30 [00:00<?, ?it/s]########tensor_layout NHD
before kernel call
after kernel call
Error no kernel image is available for execution on the device at line 73 in file /src/csrc/ops.cu
```

### System Info

```bash
- 🤗 Diffusers version: 0.36.0.dev0
- Platform: Linux-5.15.0-83-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.12.12
- PyTorch version (GPU?): 2.8.0+cu129 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.36.0
- Transformers version: 4.57.3
- Accelerate version: 1.12.0
- PEFT version: 0.18.1
- Bitsandbytes version: 0.49.1
- Safetensors version: 0.7.0
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
```

### Who can help?

@yiyixuxu @DN6 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] `no kernel image is available for execution on the device` when using sage_hub attention backend on RTX 5090 (Blackwell, sm_120) #13043

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] no kernel image is available for execution on the device when using sage_hub attention backend on RTX 5090 (Blackwell, sm_120) #13043

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] `no kernel image is available for execution on the device` when using sage_hub attention backend on RTX 5090 (Blackwell, sm_120) #13043