-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Improve textual inversion
compatibility
#10949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for i, embedding in enumerate(embeddings): | ||
if embedding.shape[-1] != expected_emb_dim: | ||
linear = nn.Linear(embedding.shape[-1], expected_emb_dim) | ||
embeddings[i] = linear(embedding) | ||
logger.info(f"Changed embedding dimension from {embedding.shape[-1]} to {expected_emb_dim}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add a test case to cover this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add a test case to cover this?
What should we do with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @suzukimain. The test would load an embedding to an incompatible model and check for the log "Changed embedding dimension...".
Also, do you have any example outputs to share?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, @hlky
The following log is what I was able to get.
The loaded token: emb_params is overwritten by the passed token EasyNegative.
Changed embedding dimension from 768 to 1024
Changed embedding dimension from 768 to 1024
Changed embedding dimension from 768 to 1024
Changed embedding dimension from 768 to 1024
Changed embedding dimension from 768 to 1024
Changed embedding dimension from 768 to 1024
Changed embedding dimension from 768 to 1024
Changed embedding dimension from 768 to 1024
Loaded textual inversion embedding for EasyNegative.
Loaded textual inversion embedding for EasyNegative_1.
Loaded textual inversion embedding for EasyNegative_2.
Loaded textual inversion embedding for EasyNegative_3.
Loaded textual inversion embedding for EasyNegative_4.
Loaded textual inversion embedding for EasyNegative_5.
Loaded textual inversion embedding for EasyNegative_6.
Loaded textual inversion embedding for EasyNegative_7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello. Do you need any other information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @suzukimain, apologies for the delay, last week was the Diffusers team offsite.
Changed embedding dimension from 768 to 1024
This text is what we would check for in the test, either just Changed embedding dimension from
or including the original + new dimensions depending on how existing TI tests are set up. Would you like assistance adding the test? happy to take over if needed.
Do you need any other information?
Example outputs from a model using an incompatible TI would be useful. cc @asomoza Is this something you've tested before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @suzukimain, apologies for the delay, last week was the Diffusers team offsite.
Changed embedding dimension from 768 to 1024
This text is what we would check for in the test, either just or including the original + new dimensions depending on how existing TI tests are set up. Would you like assistance adding the test? happy to take over if needed.
Changed embedding dimension from
Do you need any other information?
Example outputs from a model using an incompatible TI would be useful. cc @asomoza Is this something you've tested before?
Hello @hlky, if possible, could you please add a test?
Hi @suzukimain, I've ran some examples using 2 different v1 TI on v2, IMO this isn't working as expected, can you confirm whether you have seen good results with this method? gsdf/EasyNegative
import torch
from diffusers import StableDiffusionPipeline
from huggingface_hub import hf_hub_download
pipe = StableDiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
image = pipe(
prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
negative_prompt="EasyNegative",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v1.png")
path = hf_hub_download(
repo_id="gsdf/EasyNegative",
filename="EasyNegative.safetensors",
repo_type="dataset",
)
pipe.load_textual_inversion(path, token="EasyNegative")
image = pipe(
prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
negative_prompt="EasyNegative",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v1_easy_negative.png")
pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
image = pipe(
prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
negative_prompt="EasyNegative",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v2.png")
path = hf_hub_download(
repo_id="gsdf/EasyNegative",
filename="EasyNegative.safetensors",
repo_type="dataset",
)
pipe.load_textual_inversion(path, token="EasyNegative")
image = pipe(
prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
negative_prompt="EasyNegative",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v2_easy_negative.png") v1
v2
sd-concepts-library/gta5-artwork
import torch
from diffusers import StableDiffusionPipeline
from huggingface_hub import hf_hub_download
pipe = StableDiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
image = pipe(
prompt="A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v1.png")
pipe.load_textual_inversion("sd-concepts-library/gta5-artwork")
image = pipe(
prompt="A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v1_gta5.png")
pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
image = pipe(
prompt="A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v2.png")
pipe.load_textual_inversion("sd-concepts-library/gta5-artwork")
image = pipe(
prompt="A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style",
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("v2_gta5.png") v1
v2
|
Can anyone give me some advice? |
@suzukimain Is there an example of this approach working elsewhere? Looking at the code, it seems like it is a random projection through a linear layer of an embedding for SD1.5 CLIP to the dimension of SD2.1 CLIP? I don't think this will work well since you're essentially just multiplying the SD1.5 embedding by a random matrix that isn't aligned with the SD 2.1 CLIP embedding space? |
Hello @DN6, thank you for your response. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Fixes #10373
This PR fixes the issue of incompatibility in textual inversion between different SD versions such as
SD 1.5
andSD 2.1
Additionally, if you find any mistakes, please feel free to let me know.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.