Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burning Artifacts in LTX V2V Pipeline with T2V-Generated Videos at Mid-Range Strength Values #103

Open
marghovo opened this issue Jan 17, 2025 · 2 comments
Assignees

Comments

@marghovo
Copy link

Hi,

Thanks for your great work.

I am trying to leverage LTX-Video in my research which uses Video-to-Video pipeline.
When I apply LTX V2V on a video generated by LTX itself I get strange burning artifacts for strength values in the middle of its range (e.g. 0.4). These artifacts are reduced for small strength values (0.1) and very high strength values (0.9). Please see an example below:
The first video is a video generated by LTX text-to-video pipeline and it is used to generate the proceeding videos using the V2V pipeline. The value of the strength is indicated in the filename. The burning artifacts are apparent in the trees and rocks especially when using strength 0.4 and 0.7.

input.mp4
strength_0.1.mp4
strength_0.4.mp4
strength_0.7.mp4
strength_0.9.mp4

The issue related to these burning artifacts disappears when I use a different video not generated by LTX or even a screen recorded version of the video generated by LTX. I initially thought that some corruption occurs while saving the result produced by LTX T2V pipeline, however, when directly using its output as an input for V2V, the same issue occurs.

Next, I hypothesized that some inherent noise may be present in the output of LTX T2V due to its VAE decoder. I though of the following two possibilities:
(1) Some noise is present in the LTX's output because the VAE decoder conducts the last denoising step.
(2) Some noise is present in the LTX's output because the VAE decoder has noise injection in its architecture.

However, these two possibilities were rejected since when I tried using the latents of the T2V output as input for the V2V pipeline (without decoding), these artifacts were still present.

At the moment, I think of the following two possibilities for this burning artifacts:
(1) Something in the base model is adding some type of corruption to the output.
(2) The diffusion process results in some type of corruption in the output.

As a side note, I also tried the set-ups mentioned above with CogVideoX and no such burning artifacts are present in its results.

Do you have any thoughts on the problem described above and potential solutions for overcoming it?

Thanks in advance.

@yoavhacohen
Copy link
Collaborator

yoavhacohen commented Jan 18, 2025

When generating the vid-to-vid output, are you using the same seed as the one used for the original video?
Try using a different seed for the initial video generation and the vid-to-vid process.

@yoavhacohen yoavhacohen self-assigned this Jan 18, 2025
@marghovo
Copy link
Author

marghovo commented Jan 19, 2025

Hi. Thanks for your reply.

Indeed, I was using the same seed for both T2V and V2V. Changing the seeds helped resolve the issue. Do you have a possible explanation to why using the same seeds results in such kind of an artifact? Is it something specific in LTX or the general diffusion process. I did not notice this issue with other video or image models.

I also checked out your latest commit and tested T2V and it seems when using spacio-temporal guidance (STG) of 1.0 the same artifacts were present in T2V itself. Using a very small STG (e.g. 0.1) results in no such artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants