❗️ Checkout the clic2021-devkit repo for the 2021 challenge ❗️
The following helps working with the P-frame challenge.
To download all files, run:
bash download.sh path/to/data
It will create a folder path/to/data
and extract all frames there, into a structure like:
video1/
video1_frame1_y.png
video1_frame1_u.png
video1_frame1_v.png
video1_frame2_y.png
video1_frame2_u.png
video1_frame2_v.png
...
video2/
video2_frame1_y.png
video2_frame1_u.png
video2_frame1_v.png
...
For this, one of gsutil
, wget
, or curl
must be available. gsutil
is probably the most efficient way.
To download only some videos, use --max_vides
: bash download.sh path/to/data --max_videos 10
NOTE: The script first downloads all vidoes as .zip files, resulting in 250GB+ of data.
Then all zips are decompressed one by one and subsequently deleted. If you interrupt the script
while unpacking, and later re-run it, it will re-download those that were already unpacked.
To prevent this at the expense of more hard-drive space used, you can keep the zip files by passing --no_delete_zip
.
We implement a simple non-learned baseline in baseline_np.py
. The algorithm can be described as follows:
ENCODE:
Inputs: frame 1 (F1) and frame 2 (F2).
Encode F2 given F1:
1. Calculate Residual_Normalized = (F2 - F1) // 2 + 127
(this is now in range {0, ..., 255}.
The idea of the //2 is to have 256 possible values, because
otherwise we would have 511 values.)
2. Compress Residual_Normalized with JPG
-> Save to Bitstream
DECODE:
Inputs: F1 and Bitstream
1. Get Residual_Normalized from JPG in Bistream
2. F2' = F1 + ( Residual_normalized - 127 ) * 2
(F2' is the reconstruction)
The run_baseline.sh
script describes how this would be used to create a submission to the challenge server. Running it produces a decoder.zip and valout.zip which could be uploaded to the P-frame (validation)
track at challenge.compression.cc. The reason run_baseline.sh
compresses all decoder and data files using zip is to allow efficient uploads (some browsers hang if you try to upload 160000 files).
We have data loaders for PyTorch and TensorFlow. By default, they yield pairs of frames, where each frame is represented as a tuple (Y, U, V). The dimensions of U and V are half the those of Y (420 format):
[ ((Y1, U1, V1), (Y2, U2, V2)) # pair 1
((Y2, U2, V2), (Y3, U3, V3)) # pair 2
... ]
To get a single YUV tensor, we also provide a way to load merged YUV tensors (444 format):
[ (YUV1, YUV2),
(YUV2, YUV3),
...]
import pframe_dataset_tf as ds_torch
ds_420 = ds_torch.FrameSequenceDataset(data_root='data')
ds_444 = ds_torch.FrameSequenceDataset(data_root='data', merge_channels=True)
Code tested in eager and graph mode, in TensorFlow 1.15. TODO Test in TensorFlow 2.0.
import pframe_dataset_tf as ds_tf
ds_420 = ds_tf.frame_sequence_dataset(data_root='data')
ds_444 = ds_tf.frame_sequence_dataset(data_root='data', merge_channels=True)