Kandinsky Video — a new text-to-video generation model

SoTA quality among open-source solutions

This repository is the official implementation of Kandinsky Video model

Kandinsky Video is a text-to-video generation model, which is based on the FusionFrames architecture, consisting of two main stages: keyframe generation and interpolation. Our approach for temporal conditioning allows us to generate videos with high-quality appearance, smoothness and dynamics.

Pipeline

The encoded text prompt enters the U-Net keyframe generation model with temporal layers or blocks, and then the sampled latent keyframes are sent to the latent interpolation model in such a way as to predict three interpolation frames between two keyframes. A temporal MoVQ-GAN decoder is used to get the final video result.

Architecture details

Text encoder (Flan-UL2) - 8.6B
Latent Diffusion U-Net3D - 4.0B
MoVQ encoder/decoder - 256M

How to use

Check our jupyter notebooks with examples in ./examples folder

1. text2video

from video_kandinsky3 import get_T2V_pipeline

t2v_pipe = get_T2V_pipeline('cuda', fp16=True)

pfps = 'medium' # ['low', 'medium', 'high']
video = t2v_pipe(
    'a red car is drifting on the mountain road, close view, fast movement',
    width=640, height=384, fps=fps
)

Results


"A car moving on the road from the sea to the mountains"	"A red car drifting, 4k video"	"Chemistry laboratory, chemical explosion, 4k"	"Erupting volcano raw power, molten lava, and the forces of the Earth"

"Luminescent jellyfish swims underwater, neon, 4k"	"Majestic waterfalls in a lush rainforest power, mist, and biodiversity"	"White ghost flies through a night clearing, 4k"	"Wildlife migration herds on the move, crossing landscapes in harmony"

"Majestic humpback whale breaching power, grace, and ocean spectacle"	"Evoke the sense of wonder in a time-lapse journey through changing seasons"	"Explore the fascinating world of underwater creatures in a visually stunning sequence"	"Polar ice caps the pristine wilderness of the Arctic and Antarctic"

"Rolling waves on a sandy beach relaxation, rhythm, and coastal beauty"	"Sloth in slow motion deliberate movements, relaxation, and arboreal life"	"Time-lapse of a flower blooming growth, beauty, and the passage of time"	"Craft a heartwarming narrative showcasing the bond between a human and their loyal pet companion"

Authors

Vladimir Arkhipkin: Github, Google Scholar
Zein Shaheen: Github, Google Scholar
Viacheslav Vasilev: Github, Google Scholar
Igor Pavlov: Github
Elizaveta Dakhova: Github
Anastasia Lysenko: Github
Sergey Markov
Denis Dimitrov: Github, Google Scholar
Andrey Kuznetsov: Github, Google Scholar

BibTeX

If you use our work in your research, please cite our publication:

@article{arkhipkin2023fusionframes,
  title     = {FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline},
  author    = {Arkhipkin, Vladimir and Shaheen, Zein and Vasilev, Viacheslav and Dakhova, Elizaveta and Kuznetsov, Andrey and Dimitrov, Denis},
  journal   = {arXiv preprint arXiv:2311.13073},
  year      = {2023}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
__assets__		__assets__
examples		examples
video_kandinsky3		video_kandinsky3
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kandinsky Video — a new text-to-video generation model

SoTA quality among open-source solutions

Pipeline

How to use

1. text2video

Results

Authors

BibTeX

About

Releases

Packages

Languages

License

spacewalkingninja/DeKandinskyVideo

Folders and files

Latest commit

History

Repository files navigation

Kandinsky Video — a new text-to-video generation model

SoTA quality among open-source solutions

Pipeline

How to use

1. text2video

Results

Authors

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages