Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD-2.1 #4

Open
zhou431496 opened this issue Jul 1, 2024 · 3 comments
Open

SD-2.1 #4

zhou431496 opened this issue Jul 1, 2024 · 3 comments

Comments

@zhou431496
Copy link

Very excellent job, if you migrate him to 50-step SD-2-1, can you work well?

@sgk98
Copy link
Collaborator

sgk98 commented Jul 1, 2024

Thanks for your interest! If you set the model to sd-turbo, you would be using the distilled 1-step version of the SD2.1 model, and as you might observe, it works quite well, and is also very fast to optimize (the whole 50 step optimization process can be done in <30 seconds).
If you would replace this with the 50 step SD2.1 model, the optimization process would be much longer (~5 minutes per sample), and the results might be worse because of exploding/vanishing gradients. You could have a look at this codebase (https://github.com/salesforce/DOODL) and applying our reward models (HPSv2.1, ImageReward, PickScore) there instead of just the CLIP objective should provide improvements to the DOODL results (however, will need ~40GB VRAM and a few minutes of optimization).

@zhou431496
Copy link
Author

Thanks for your interest! If you set the model to sd-turbo, you would be using the distilled 1-step version of the SD2.1 model, and as you might observe, it works quite well, and is also very fast to optimize (the whole 50 step optimization process can be done in <30 seconds). If you would replace this with the 50 step SD2.1 model, the optimization process would be much longer (~5 minutes per sample), and the results might be worse because of exploding/vanishing gradients. You could have a look at this codebase (https://github.com/salesforce/DOODL) and applying our reward models (HPSv2.1, ImageReward, PickScore) there instead of just the CLIP objective should provide improvements to the DOODL results (however, will need ~40GB VRAM and a few minutes of optimization).

Thank you very much for your insight. In the experiment, we found that the noise regularization appeared -INF, and then we changed the precision to float32, and got a gradient of more than -600, and the generated image has a large area of light spots. We can keep changing the learning rate without changing the spot. We think there might be a problem with the noise regularization, and we'd rather hear from you.

@sgk98
Copy link
Collaborator

sgk98 commented Jul 2, 2024

This is quite surprising. Are you facing these issues with one-step models (e.g. SD-Turbo, SDXL-Turbo)? The regularization objective is mostly just to ensure that the norm stays around the original value (128), but otherwise should not play a huge role (you should be able to get decent results even without it with disable_reg). Could you share some more details about the exact setting/command that you used where you had these issues? @lucaeyring might also be able to help you here.

If this was with 50 step models (e.g. SD2.1), we also did have some issues getting this to work in our setup, you might be better of incorporating human preference objectives into the DOODL codebase which seems to have worked out the challenges of optimizing multi-step models (with more memory+time), with some clever tricks (multi-crop, gradient clipping, spherical loss etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants