Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] LTX Video 0.9.1 #10330

Merged
merged 19 commits into from
Dec 23, 2024
Merged

[core] LTX Video 0.9.1 #10330

merged 19 commits into from
Dec 23, 2024

Conversation

a-r-r-o-w
Copy link
Member

To run conversion:

 python3 scripts/convert_ltx_to_diffusers.py --transformer_ckpt_path /raid/aryan/ltx-new/ltx-video-2b-v0.9.1.safetensors --vae_ckpt_path /raid/aryan/ltx-new/ltx-video-2b-v0.9.1.safetensors --output_path /raid/aryan/ltx-diffusers --dtype bf16 --version 0.9.1 --text_encoder_cache_dir /raid/.cache/huggingface/ --save_pipeline

(I've verified that the conversion for v0.9.0 still works after the current modifications to the script)

Inference after conversion:

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained("/raid/aryan/ltx-diffusers", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
    decode_timestep=0.05,
    generator=torch.Generator(device="cuda").manual_seed(0),
).frames[0]
export_to_video(video, "output.mp4", fps=24)

Output on prompts from the model page:

ltxv-091-output-downscaled.mp4

Will open weights PR to the official repository soon.

cc @yoavhacohen @SapirW

@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu December 21, 2024 04:25
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w a-r-r-o-w added the roadmap Add to current release roadmap label Dec 21, 2024
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fast! thanks!

@tin2tin
Copy link

tin2tin commented Dec 22, 2024

Are the 0.9.1 diffusers weights up on HuggingFace?

@a-r-r-o-w
Copy link
Member Author

@tin2tin We're still working with their team on how to host the weights. It might take some time :(

Until then, the model is available here as well: https://huggingface.co/a-r-r-o-w/LTX-Video-0.9.1-diffusers, mostly because I need this weight format for finetuning. Once we have it hosted officially, those can be used instead

@nitinmukesh
Copy link

Until then, the model is available here as well: https://huggingface.co/a-r-r-o-w/LTX-Video-0.9.1-diffusers, mostly because I need this weight format for finetuning. Once we have it hosted officially, those can be used instead

Thank you for sharing.

@a-r-r-o-w
Copy link
Member Author

cc @DN6 for single-file related support

@a-r-r-o-w a-r-r-o-w requested a review from DN6 December 22, 2024 12:14
@tin2tin
Copy link

tin2tin commented Dec 23, 2024

Wow, this 0.9.1 (with prompt input) delivers very good quality video, and very fast!
First try:

-932937092_A_woman_with_long_brown_hair_and_light_.mp4

I guess img2vid is not implemented yet - I'm getting this error - sorry, if this is just premature to test this:

Python311\site-packages\diffusers\models\autoencoders\autoencoder_kl_ltx.py", line 431, in forward
    timestep=temb.flatten(),
             ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'flatten'

@a-r-r-o-w
Copy link
Member Author

Oh, taking a look... This PR should worked for I2V as well

@a-r-r-o-w
Copy link
Member Author

@tin2tin Could you try again? Thanks for catching!

@tin2tin
Copy link

tin2tin commented Dec 23, 2024

@a-r-r-o-w That was quick! Yes, it is fixed now! Thank you!

1017385988_Photo_of_Photo_of_A_woman_with_long_bro.mp4

@a-r-r-o-w a-r-r-o-w requested a review from DN6 December 23, 2024 13:30
@a-r-r-o-w a-r-r-o-w merged commit 4b55713 into main Dec 23, 2024
15 checks passed
@a-r-r-o-w a-r-r-o-w deleted the ltxv-0.9.1-integration branch December 23, 2024 14:21
@scarbain
Copy link

@tin2tin We're still working with their team on how to host the weights. It might take some time :(

Until then, the model is available here as well: https://huggingface.co/a-r-r-o-w/LTX-Video-0.9.1-diffusers, mostly because I need this weight format for finetuning. Once we have it hosted officially, those can be used instead

Hi @a-r-r-o-w, thanks for this PR merged ! Any script for finetuning or LoRA i2v available somewhere ? :)

@a-r-r-o-w
Copy link
Member Author

@scarbain There's one for T2V here: https://github.com/a-r-r-o-w/finetrainers. I2V will be supported soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants