-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When enabling the experimental sage_hub attention backend on RTX 5090 (Blackwell architecture, compute capability 12.0) with PyTorch 2.8 + CUDA 12.9, inference fails with CUDA kernel compatibility error:
Error no kernel image is available for execution on the device at line 73 in file /src/csrc/ops.cuReproduction
import torch
from diffusers.pipelines.flux2.pipeline_flux2 import Flux2Pipeline
import numpy as np
pipe = Flux2Pipeline.from_pretrained("models/FLUX.2-dev-bnb-4bit", text_encoder=None, torch_dtype=torch.bfloat16).to("cuda:0")
pipe.transformer.set_attention_backend("sage_hub"). # <- Bug here. When I commented out that line, it worked fine.
# pipe.load_lora_weights("models/flux-dev-inpaint/pytorch_lora_weights.safetensors")
def create_random_pil(size=(512, 512)):
arr = np.random.randint(0, 255, (size[1], size[0], 3), dtype=np.uint8)
return Image.fromarray(arr)
coarse_pil = create_random_pil((512, 512))
garment_pil = create_random_pil((512, 512))
prompt_embeds = torch.randn(1, 256, 15360, dtype=torch.bfloat16, device="cuda:0")
images = pipe(
image=[coarse_pil, garment_pil],
prompt_embeds=prompt_embeds,
height=512,
width=512,
guidance_scale=7.5,
num_inference_steps=30,
generator=torch.Generator("cpu").manual_seed(42),
)Logs
... ...(Initialize output information)
0%| | 0/30 [00:00<?, ?it/s]########tensor_layout NHD
before kernel call
after kernel call
Error no kernel image is available for execution on the device at line 73 in file /src/csrc/ops.cuSystem Info
- 🤗 Diffusers version: 0.36.0.dev0
- Platform: Linux-5.15.0-83-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.12.12
- PyTorch version (GPU?): 2.8.0+cu129 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.36.0
- Transformers version: 4.57.3
- Accelerate version: 1.12.0
- PEFT version: 0.18.1
- Bitsandbytes version: 0.49.1
- Safetensors version: 0.7.0
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
NVIDIA GeForce RTX 5090, 32607 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working