In the latest version of transformers (4.49.0) matrix transformation error is encountered #36571

idebroy · 2025-03-06T05:33:31Z

System Info

transformer Version : 4.49.0
python version: python3.10
env : HuggingFace spaces

Looks to be working in : 4.48.3

Please find the following HuggingFace Space code which works in (4.48.3) but fails in (4.49.0)
Code :
`
import os

| import random
| import uuid
| import gradio as gr
| import numpy as np
| from PIL import Image
| import spaces
| import torch
| from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
| from typing import Tuple
|
| css = '''
| .gradio-container{max-width: 575px !important}
| h1{text-align:center}
| footer {
| visibility: hidden
| }
| '''
|
| DESCRIPTIONXX = """## lStation txt2Img🥠"""
|
| examples = [
|
| "A tiny reptile hatching from an egg on the mars, 4k, planet theme, --style raw5 --v 6.0",
| "An anime-style illustration of a delicious, rice biryani with curry and chilli pickle --style raw5",
| "Iced tea in a cup --ar 85:128 --v 6.0 --style raw5, 4K, Photo-Realistic",
| "A zebra holding a sign that says Welcome to Zoo --ar 85:128 --v 6.0 --style raw",
| "A splash page of Spiderman swinging through a futuristic cityscape filled with flying cars, the scene depicted in a vibrant 3D rendered Marvel comic art style.--style raw5, 4K, Photo-Realistic"
| ]
|
| MODEL_OPTIONS = {
|
| "LIGHTNING V5.0": "SG161222/RealVisXL_V5.0_Lightning",
| "LIGHTNING V4.0": "SG161222/RealVisXL_V4.0_Lightning",
| }
|
| MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "4096"))
| USE_TORCH_COMPILE = os.getenv("USE_TORCH_COMPILE", "0") == "1"
| ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"
| BATCH_SIZE = int(os.getenv("BATCH_SIZE", "1"))
|
| device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
|
| style_list = [
| {
| "name": "3840 x 2160",
| "prompt": "hyper-realistic 8K image of {prompt}. ultra-detailed, lifelike, high-resolution, sharp, vibrant colors, photorealistic",
| "negative_prompt": "cartoonish, low resolution, blurry, simplistic, abstract, deformed, ugly",
| },
| {
| "name": "2560 x 1440",
| "prompt": "hyper-realistic 4K image of {prompt}. ultra-detailed, lifelike, high-resolution, sharp, vibrant colors, photorealistic",
| "negative_prompt": "cartoonish, low resolution, blurry, simplistic, abstract, deformed, ugly",
| },
| {
| "name": "HD+",
| "prompt": "hyper-realistic 2K image of {prompt}. ultra-detailed, lifelike, high-resolution, sharp, vibrant colors, photorealistic",
| "negative_prompt": "cartoonish, low resolution, blurry, simplistic, abstract, deformed, ugly",
| },
| {
| "name": "Style Zero",
| "prompt": "{prompt}",
| "negative_prompt": "",
| },
| ]
|
| styles = {k["name"]: (k["prompt"], k["negative_prompt"]) for k in style_list}
| DEFAULT_STYLE_NAME = "3840 x 2160"
| STYLE_NAMES = list(styles.keys())
|
| def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
| if style_name in styles:
| p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
| else:
| p, n = styles[DEFAULT_STYLE_NAME]
|
| if not negative:
| negative = ""
| return p.replace("{prompt}", positive), n + negative
|
| def load_and_prepare_model(model_id):
| pipe = StableDiffusionXLPipeline.from_pretrained(
| model_id,
| torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
| use_safetensors=True,
| add_watermarker=False,
| ).to(device)
| pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
|
| if USE_TORCH_COMPILE:
| pipe.compile()
|
| if ENABLE_CPU_OFFLOAD:
| pipe.enable_model_cpu_offload()
|
| return pipe
|
| # Preload and compile both models
| models = {key: load_and_prepare_model(value) for key, value in MODEL_OPTIONS.items()}
|
| MAX_SEED = np.iinfo(np.int32).max
|
| def save_image(img):
| unique_name = str(uuid.uuid4()) + ".png"
| img.save(unique_name)
| return unique_name
|
| def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
| if randomize_seed:
| seed = random.randint(0, MAX_SEED)
| return seed
|
| @spaces.GPU(duration=60, enable_queue=True)
| def generate(
| model_choice: str,
| prompt: str,
| negative_prompt: str = "extra limbs, extra fingers, extra toes, unnatural proportions, distorted anatomy, disjointed limbs, mutated body parts, broken bones, oversized limbs, unrealistic muscles, merged faces, extra eyes, floating features, disfigured hands, incorrect joint placement, missing parts, blurry details, asymmetrical body structure, glitched textures",
| use_negative_prompt: bool = False,
| style_selection: str = DEFAULT_STYLE_NAME,
| seed: int = 1,
| width: int = 1024,
| height: int = 1024,
| guidance_scale: float = 3,
| num_inference_steps: int = 25,
| randomize_seed: bool = False,
| use_resolution_binning: bool = True,
| num_images: int = 1,
| progress=gr.Progress(track_tqdm=True),
| ):
| global models
| pipe = models[model_choice]
|
| seed = int(randomize_seed_fn(seed, randomize_seed))
| generator = torch.Generator(device=device).manual_seed(seed)
|
| prompt, negative_prompt = apply_style(style_selection, prompt, negative_prompt)
|
| options = {
| "prompt": [prompt] * num_images,
| "negative_prompt": [negative_prompt] * num_images if use_negative_prompt else None,
| "width": width,
| "height": height,
| "guidance_scale": guidance_scale,
| "num_inference_steps": num_inference_steps,
| "generator": generator,
| "output_type": "pil",
| }
|
| if use_resolution_binning:
| options["use_resolution_binning"] = True
|
| images = []
| for i in range(0, num_images, BATCH_SIZE):
| batch_options = options.copy()
| batch_options["prompt"] = options["prompt"][i:i + BATCH_SIZE]
| if "negative_prompt" in batch_options:
| batch_options["negative_prompt"] = options["negative_prompt"][i:i + BATCH_SIZE]
| images.extend(pipe(**batch_options).images)
|
| image_paths = [save_image(img) for img in images]
|
| return image_paths, seed
|
| with gr.Blocks(css=css, theme="bethecloud/storj_theme") as demo:
| gr.Markdown(DESCRIPTIONXX)
| with gr.Row():
| prompt = gr.Text(
| label="Prompt",
| show_label=False,
| max_lines=1,
| placeholder="Enter your prompt",
| container=False,
| )
| run_button = gr.Button("Run", scale=0)
| result = gr.Gallery(label="Result", columns=1, show_label=False)
|
| with gr.Row():
| model_choice = gr.Dropdown(
| label="Model Selection⬇️",
| choices=list(MODEL_OPTIONS.keys()),
| value="LIGHTNING V5.0"
| )
|
| with gr.Accordion("Advanced options", open=False, visible=True):
| style_selection = gr.Radio(
| show_label=True,
| container=True,
| interactive=True,
| choices=STYLE_NAMES,
| value=DEFAULT_STYLE_NAME,
| label="Quality Style",
| )
| num_images = gr.Slider(
| label="Number of Images",
| minimum=1,
| maximum=5,
| step=1,
| value=1,
| )
| with gr.Row():
| with gr.Column(scale=1):
| use_negative_prompt = gr.Checkbox(label="Use negative prompt", value=True)
| negative_prompt = gr.Text(
| label="Negative prompt",
| max_lines=5,
| lines=4,
| placeholder="Enter a negative prompt",
| value="(deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation",
| visible=True,
| )
| seed = gr.Slider(
| label="Seed",
| minimum=0,
| maximum=MAX_SEED,
| step=1,
| value=0,
| )
| randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
| with gr.Row():
| width = gr.Slider(
| label="Width",
| minimum=512,
| maximum=MAX_IMAGE_SIZE,
| step=8,
| value=1024,
| )
| height = gr.Slider(
| label="Height",
| minimum=512,
| maximum=MAX_IMAGE_SIZE,
| step=8,
| value=1024,
| )
| with gr.Row():
| guidance_scale = gr.Slider(
| label="Guidance Scale",
| minimum=0.1,
| maximum=6,
| step=0.1,
| value=3.0,
| )
| num_inference_steps = gr.Slider(
| label="Number of inference steps",
| minimum=1,
| maximum=60,
| step=1,
| value=28,
| )
| gr.Examples(
| examples=examples,
| inputs=prompt,
| cache_examples=False
| )
|
| use_negative_prompt.change(
| fn=lambda x: gr.update(visible=x),
| inputs=use_negative_prompt,
| outputs=negative_prompt,
| api_name=False,
| )
|
| gr.on(
| triggers=[
| prompt.submit,
| negative_prompt.submit,
| run_button.click,
| ],
| fn=generate,
| inputs=[
| model_choice,
| prompt,
| negative_prompt,
| use_negative_prompt,
| style_selection,
| seed,
| width,
| height,
| guidance_scale,
| num_inference_steps,
| randomize_seed,
| num_images,
| ],
| outputs=[result, seed]
| )
|
| if name == "main":
| demo.queue(max_size=50).launch(show_api=True)

`

Exception:
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 256, in thread_wrapper
res = future.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/app/app.py", line 158, in generate
images.extend(pipe(**batch_options).images)
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 1086, in call
) = self.encode_prompt(
File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 406, in encode_prompt
prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1490, in forward
text_embeds = self.text_projection(pooled_output)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Just try to execute the above space code in a GPU enabled system.
While generating any image it fails with the exception in the description posted above.
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half

Expected behavior

There should not be any exception.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-03-06T11:00:46Z

Hi @idebroy, this code is too long for us to really figure out what's going on! Can you try to make some minimal code that reproduces the issue?

idebroy · 2025-03-07T05:35:55Z

This is a short version of code:
`import spaces
import os
import random
import uuid
import torch
import numpy as np
from PIL import Image
import gradio as gr
from diffusers import EulerAncestralDiscreteScheduler, StableDiffusionXLPipeline

css = """
.gradio-container { max-width: 575px !important; }
h1 { text-align: center; }
footer { visibility: hidden; }
"""

DESCRIPTION = "## lStation txt2Img🥠"

examples = [
"A tiny reptile hatching from an egg on Mars, 4K, planet theme",
"Spiderman swinging through a futuristic cityscape filled with flying cars, 4K, photo-realistic",
]

DEFAULT_MODEL_ID = "SG161222/RealVisXL_V5.0_Lightning"

MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "4096"))
USE_TORCH_COMPILE = os.getenv("USE_TORCH_COMPILE", "0") == "1"
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD", "0") == "1"
BATCH_SIZE = int(os.getenv("BATCH_SIZE", "1"))

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def load_and_prepare_model(model_id):
pipe = StableDiffusionXLPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
use_safetensors=True,
add_watermarker=False,
).to(device)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
if USE_TORCH_COMPILE:
pipe.compile()
if ENABLE_CPU_OFFLOAD:
pipe.enable_model_cpu_offload()
return pipe

Load the default model once

model = load_and_prepare_model(DEFAULT_MODEL_ID)
MAX_SEED = np.iinfo(np.int32).max

def save_image(img):
unique_name = f"{uuid.uuid4()}.png"
img.save(unique_name)
return unique_name

def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
return random.randint(0, MAX_SEED) if randomize_seed else seed

@spaces.GPU(duration=60, enable_queue=True)
def generate(prompt, seed, randomize_seed, width, height, guidance_scale, num_inference_steps, num_images):
# Update the seed if randomization is enabled
seed = int(randomize_seed_fn(seed, randomize_seed))
generator = torch.Generator(device=device).manual_seed(seed)

options = {
    "prompt": [prompt] * num_images,
    "width": width,
    "height": height,
    "guidance_scale": guidance_scale,
    "num_inference_steps": num_inference_steps,
    "generator": generator,
    "output_type": "pil",
}

images = []
for i in range(0, num_images, BATCH_SIZE):
    batch_options = options.copy()
    batch_options["prompt"] = options["prompt"][i : i + BATCH_SIZE]
    images.extend(model(**batch_options).images)
    
image_paths = [save_image(img) for img in images]
return image_paths, seed

with gr.Blocks(css=css) as demo:
gr.Markdown(DESCRIPTION)

with gr.Row():
    prompt = gr.Text(label="Prompt", placeholder="Enter your prompt")
    run_button = gr.Button("Run")
    
result = gr.Gallery(label="Result", columns=1)

with gr.Accordion("Advanced options", open=False):
    num_images = gr.Slider(label="Number of Images", minimum=1, maximum=5, step=1, value=1)
    seed = gr.Slider(label="Seed", minimum=0, maximum=MAX_SEED, step=1, value=0)
    randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
    with gr.Row():
        width = gr.Slider(label="Width", minimum=512, maximum=MAX_IMAGE_SIZE, step=8, value=1024)
        height = gr.Slider(label="Height", minimum=512, maximum=MAX_IMAGE_SIZE, step=8, value=1024)
    with gr.Row():
        guidance_scale = gr.Slider(label="Guidance Scale", minimum=0.1, maximum=6, step=0.1, value=3.0)
        num_inference_steps = gr.Slider(label="Inference Steps", minimum=1, maximum=60, step=1, value=28)

gr.Examples(examples=examples, inputs=prompt, cache_examples=False)

run_button.click(
    fn=generate,
    inputs=[prompt, seed, randomize_seed, width, height, guidance_scale, num_inference_steps, num_images],
    outputs=[result, seed]
)

demo.queue(max_size=50).launch(show_api=True)`

idebroy · 2025-03-07T05:39:11Z

It always happens to generate the image with the latest transformer in this function:
def generate(prompt, seed, randomize_seed, width, height, guidance_scale, num_inference_steps, num_images)

idebroy added the bug label Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In the latest version of transformers (4.49.0) matrix transformation error is encountered #36571

In the latest version of transformers (4.49.0) matrix transformation error is encountered #36571

idebroy commented Mar 6, 2025

Rocketknight1 commented Mar 6, 2025

idebroy commented Mar 7, 2025

idebroy commented Mar 7, 2025

In the latest version of transformers (4.49.0) matrix transformation error is encountered #36571

In the latest version of transformers (4.49.0) matrix transformation error is encountered #36571

Comments

idebroy commented Mar 6, 2025

System Info

Please find the following HuggingFace Space code which works in (4.48.3) but fails in (4.49.0) Code : ` import os

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Mar 6, 2025

idebroy commented Mar 7, 2025

Load the default model once

idebroy commented Mar 7, 2025

Please find the following HuggingFace Space code which works in (4.48.3) but fails in (4.49.0)
Code :
`
import os