Code and scripts for "LAVIB: Large-scale Video Interpolation Benchmark"
To appear in the 38th Annual Conference on Neural Information Processing Systems (NeurIPS) 2024
[project website 🌐]
[arXiv preprint 📃]
[dataset 🤗]
The dataset and splits are hosted on huggingface
The dataset is stored in multiple chuncks of 20GB (lavib00, lavib01,etc.). This is done to avoid network overheads and allow and improve download speeds over multiple threads. After downloading files will need to then be combined before being extracted.
-
name
is the unique video index from which the clip is obtained. -
shot
is the index of the extracted 10-second segment from the video. -
tmp_crop
is the index (1-10) of the 1-second temporal location of the clip. -
vrt_crop
is the spatial location (1-2) that the tubelet is exctracted from. It corresponds to the Y axis. -
hrz_crop
is the spatial location (1-2) that the tubelet is exctracted from. It corresponds to the X axis.
The folders containing videos can be referenced by: <name>_hot<shot>_<tmp_crop>_<vrt_crop>_<hrz_crop>/vid.mp4
The main benchmark splits are
train.csv
,val.csv
, andtest.csv
.
OOD splits can be loaded frfom their respective
.csvs
:
OOD-AFM
-
train_high_fm.csv
,val_high_fm.csv
, andtest_high_fm.csv
-
train_low_fm.csv
,val_low_fm.csv
, andtest_low_fm.csv
OOD-ALV
-
train_high_lv.csv
,val_high_lv.csv
, andtest_high_lv.csv
-
train_low_lv.csv
,val_low_lv.csv
, andtest_low_lv.csv
OOD-ARMS
-
train_high_rc.csv
,val_high_rc.csv
, andtest_high_rc.csv
-
train_low_rc.csv
,val_low_rc.csv
, andtest_low_rc.csv
OOD-APL
-
train_high_pl.csv
,val_high_pl.csv
, andtest_high_pl.csv
-
train_low_pl.csv
,val_low_pl.csv
, andtest_low_pl.csv
You can also automatically download data and splits with lavib_downloader.sh
.
You can resize video frames during data loading. Howevewer this includes significant overheads in loading/processing times. As an alternative you can store the videos at reduced resolutions and load them directly. To do this you can use resize.py
with resizes videos to 540x540.
Three codebases are adjusted for VFI general instructions are given below
The required packages are listed below
torch >= 1.13.0
torchvision >= 0.14.0
numpy >= 1.22.4
pandas >= 1.3.4
sk-video >= 1.1.10
tqdm >= 4.65.0
wget >= 3.3
timm >= 1.0.3
pytorchvideo
->pip install git+https://github.com/facebookresearch/pytorchvideo.git@1fadaef40dd393ca09680f55582399f4679fc9b7
pytorch_msssim >= 1.0.0
Please see the original repo for more details RIFE repo link.
To run either training or inference use VFI/RIDE/train.py
The following call arguments are added:
root_dir
: The folder location thatsegments_downsampled
are stored in. If you are using the original sizes of videos you can adjustVFI/RIFE/dataset.py
to load directly thesegments
.eval_only
: Integer (0-1) for running only inference. If set to 1 then only inference will run.set
: Definition for the challenge to run seechoices
for the available options.
Example run for training:
python train.py --batch_size 4 --root_dir /media/SCRATCH/LAVIB
Example run for inference (only) in high_afm
:
python train.py --batch_size 1 --root_dir /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth
Please see the original repo for more details EMA-VFI repo link.
For train or inference use VFI/EMA-VFI/train.py
.
The following call arguments are added:
data_path
: The folder location thatsegments_downsampled
are stored in. If you are using the original sizes of videos you can adjustVFI/RIFE/dataset.py
to load directly thesegments
.eval_only
: Integer (0-1) for running only inference. If set to 1 then only inference will run.set
: Definition for the challenge to run seechoices
for the available options.
Example run for training:
python train.py --batch_size 4 --data_path /media/SCRATCH/LAVIB
Example run for inference (only) in high_afm
:
python train.py --batch_size 1 --data_path /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth
Please see the original repo for more details FLAVR repo link.
For train or inference use VFI/FLAVR/main.py
.
The following call arguments are added:
data_root
: The folder location thatsegments_downsampled
are stored in. If you are using the original sizes of videos you can adjustVFI/RIFE/dataset.py
to load directly thesegments
.eval_only
: Integer (0-1) for running only inference. If set to 1 then only inference will run.set
: Definition for the challenge to run seechoices
for the available options.
Example run for training:
python main.py --batch_size 4 --data_root /media/SCRATCH/LAVIB
Example run for inference (only) in high_afm
:
python main.py --batch_size 1 --data_path /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth
Main benchmark weights can be found here
OOD challenges weights can be found here
@inproceedings{stergiou2024lavib,
title={LAVIB: Large-scale Video Interpolation Benchmark},
author={Stergiou, Alexandros},
booktitle={NeurIPS},
year={2024}
}
CC BY-SA-NC 4.0