UNet2DConditionModel can't load sd-1.5 config with method from_pretrained()？ #7445

JinShuwenABC · 2024-03-23T15:56:26Z

JinShuwenABC
Mar 23, 2024

I try to use diffusers.T2IAdapter and diffusers.UNet2DConditionModel to train T2IAdapter with sd-1.5.

For T2IAdapter, I use config.json below to init it. I think it serve sd-1.5 and sd-2.1.

{
"_class_name": "T2IAdapter",
"_diffusers_version": "0.18.0.dev0",
"adapter_type": "full_adapter",
"channels": [
320,
640,
1280,
1280
],
"downscale_factor": 16,
"in_channels": 3,
"num_res_blocks": 2
}

For UNet2DConditionModel, I use from_pretrained() as follow.

unet = UNet2DConditionModel.from_pretrained(
    args.pretrained_model_name_or_path, # stable-diffusion-v1-5 local addr
    subfolder="unet",
    revision=args.revision
)

While it stil perform as a sdxl model, because it alert as follow.

{'time_embedding_dim', 'mid_block_only_cross_attention', 'time_cond_proj_dim', 'timestep_post_act', 'time_embedding_act_fn', 'transformer_layers_per_block', 'addition_embed_type', 'class_embed_type', 'dual_cross_attention', 'upcast_attention', 'projection_class_embeddings_input_dim', 'attention_type', 'conv_in_kernel', 'encoder_hid_dim', 'addition_embed_type_num_heads', 'conv_out_kernel', 'mid_block_type', 'cross_attention_norm', 'only_cross_attention', 'encoder_hid_dim_type', 'num_class_embeds', 'use_linear_projection', 'time_embedding_type', 'dropout', 'resnet_time_scale_shift', 'resnet_out_scale_factor', 'num_attention_heads', 'addition_time_embed_dim', 'class_embeddings_concat', 'resnet_skip_time_act'} was not found in config. Values will be initialized to default values.

And, throw error like this:

│ /data/anaconda/envs/lib/python3.7/site-packages/diffusers/models/unet_2d_blocks.py:10 │
│ 97 in forward │
│ │
│ 1094 │ │ │ │
│ 1095 │ │ │ # apply additional residuals to the output of the last pair of resnet and at │
│ 1096 │ │ │ if i == len(blocks) - 1 and additional_residuals is not None: │
│ ❱ 1097 │ │ │ │ hidden_states = hidden_states + additional_residuals │
│ 1098 │ │ │ │
│ 1099 │ │ │ output_states = output_states + (hidden_states,) │
│ 1100 │
╰──────────────────────────────────────────────────────────────

The size of tensor a (64) must match the size of tensor b (32) at non-singleton dimension 3

If I change the "adapter_type": "full_adapter" into "adapter_type": "full_adapter_xl", and use local stable-diffusion-xl, it can work well.
Besides, as I known, the pipeline support the sd-1.5, which makes me confused.
I use diffusers in version 0.21.0, maybe I have to upgrade it to get this service？
Thank you to anyone who answer this.

tolgacangoz · 2024-03-23T19:49:42Z

tolgacangoz
Mar 23, 2024

Hi @JinShuwenABC,
Could you remove "downscale_factor": 16 and try again?

Btw, your diffusers's version is 6 months old. I recommend upgrading it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UNet2DConditionModel can't load sd-1.5 config with method from_pretrained()？ #7445

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

UNet2DConditionModel can't load sd-1.5 config with method from_pretrained()？ #7445

Uh oh!

Uh oh!

JinShuwenABC Mar 23, 2024

Replies: 1 comment

Uh oh!

Uh oh!

tolgacangoz Mar 23, 2024

JinShuwenABC
Mar 23, 2024

tolgacangoz
Mar 23, 2024