You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to leverage LTX-Video in my research which uses Video-to-Video pipeline.
When I apply LTX V2V on a video generated by LTX itself I get strange burning artifacts for strength values in the middle of its range (e.g. 0.4). These artifacts are reduced for small strength values (0.1) and very high strength values (0.9). Please see an example below:
The first video is a video generated by LTX text-to-video pipeline and it is used to generate the proceeding videos using the V2V pipeline. The value of the strength is indicated in the filename. The burning artifacts are apparent in the trees and rocks especially when using strength 0.4 and 0.7.
The issue related to these burning artifacts disappears when I use a different video not generated by LTX or even a screen recorded version of the video generated by LTX. I initially thought that some corruption occurs while saving the result produced by LTX T2V pipeline, however, when directly using its output as an input for V2V, the same issue occurs.
Next, I hypothesized that some inherent noise may be present in the output of LTX T2V due to its VAE decoder. I though of the following two possibilities:
(1) Some noise is present in the LTX's output because the VAE decoder conducts the last denoising step.
(2) Some noise is present in the LTX's output because the VAE decoder has noise injection in its architecture.
However, these two possibilities were rejected since when I tried using the latents of the T2V output as input for the V2V pipeline (without decoding), these artifacts were still present.
At the moment, I think of the following two possibilities for this burning artifacts:
(1) Something in the base model is adding some type of corruption to the output.
(2) The diffusion process results in some type of corruption in the output.
As a side note, I also tried the set-ups mentioned above with CogVideoX and no such burning artifacts are present in its results.
Do you have any thoughts on the problem described above and potential solutions for overcoming it?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
When generating the vid-to-vid output, are you using the same seed as the one used for the original video?
Try using a different seed for the initial video generation and the vid-to-vid process.
Indeed, I was using the same seed for both T2V and V2V. Changing the seeds helped resolve the issue. Do you have a possible explanation to why using the same seeds results in such kind of an artifact? Is it something specific in LTX or the general diffusion process. I did not notice this issue with other video or image models.
I also checked out your latest commit and tested T2V and it seems when using spacio-temporal guidance (STG) of 1.0 the same artifacts were present in T2V itself. Using a very small STG (e.g. 0.1) results in no such artifacts.
Hi,
Thanks for your great work.
I am trying to leverage LTX-Video in my research which uses Video-to-Video pipeline.
When I apply LTX V2V on a video generated by LTX itself I get strange burning artifacts for strength values in the middle of its range (e.g. 0.4). These artifacts are reduced for small strength values (0.1) and very high strength values (0.9). Please see an example below:
The first video is a video generated by LTX text-to-video pipeline and it is used to generate the proceeding videos using the V2V pipeline. The value of the strength is indicated in the filename. The burning artifacts are apparent in the trees and rocks especially when using strength 0.4 and 0.7.
input.mp4
strength_0.1.mp4
strength_0.4.mp4
strength_0.7.mp4
strength_0.9.mp4
The issue related to these burning artifacts disappears when I use a different video not generated by LTX or even a screen recorded version of the video generated by LTX. I initially thought that some corruption occurs while saving the result produced by LTX T2V pipeline, however, when directly using its output as an input for V2V, the same issue occurs.
Next, I hypothesized that some inherent noise may be present in the output of LTX T2V due to its VAE decoder. I though of the following two possibilities:
(1) Some noise is present in the LTX's output because the VAE decoder conducts the last denoising step.
(2) Some noise is present in the LTX's output because the VAE decoder has noise injection in its architecture.
However, these two possibilities were rejected since when I tried using the latents of the T2V output as input for the V2V pipeline (without decoding), these artifacts were still present.
At the moment, I think of the following two possibilities for this burning artifacts:
(1) Something in the base model is adding some type of corruption to the output.
(2) The diffusion process results in some type of corruption in the output.
As a side note, I also tried the set-ups mentioned above with CogVideoX and no such burning artifacts are present in its results.
Do you have any thoughts on the problem described above and potential solutions for overcoming it?
Thanks in advance.
The text was updated successfully, but these errors were encountered: