This is a repository for the models proposed in the paper "Analysis of video quality datasets via design of minimalistic video quality models". TPAMI Version Arxiv Version
Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by comparing our model generalization capabilities on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.
Model | Spatial Quality Analyzer | Temporal Quality Analyzer | Weights trained on LSVQ |
---|---|---|---|
Model I | ResNet-50 (ImageNet-1k) | None | weights |
Model II | ResNet-50 (pre-trained on IQA datasets) | None | as Model I |
Model III | ResNet-50 (pre-trained on the LSVQ dataset) | None | |
Model IV | ResNet-50 (ImageNet-1k) | SlowFast | weights |
Model V | ResNet-50 (pre-trained on IQA datasets) | SlowFast | |
Model VI | ResNet-50 (pre-trained on the LSVQ dataset) | SlowFast | |
Model VII | Swin-B (ImageNet-1k) | None | weights |
Model VIII | Swin-B (pre-trained on the LSVQ dataset) | None | as Model VII |
Model IX | Swin-B (ImageNet-1k) | SlowFast | weights |
Model X | Swin-B (pre-trained on the LSVQ dataset) | SlowFast | as Model IX |
- CVD2014
- LIVE-Qualcomm
- KoNViD-1k: The video names in the file data/KoNViD-1k_data.mat are not in the same format as those in the official released version. You can download the version of KoNViD-1k (password: 1adp) that we used to match the video names.
- LIVE-VQC
- YouTube-UGC: The videos in YouTube-UGC are dynamically updated, so the videos you download may be slightly different from those used in this paper.
- LBVD
- LSVQ: The official link may be broken; you can download the unofficially released version.
- LIVE-YT-Gaming unofficially released version
For detail introduction of these datasets, please refer to the paper.
- Extract the images:
python -u frame_extraction/extract_frame.py \
--dataset KoNViD1k \
--dataset_file data/KoNViD-1k_data.mat \
--videos_dir /data/sunwei_data/konvid1k \
--save_folder /data/sunwei_data/video_data/KoNViD1k/image_384p \
--video_length_min 10 \
--resize 384 \
>> logs/extract_frame_KoNViD1k_384p.log
- Extract the temporal features:
CUDA_VISIBLE_DEVICES=0 python -u temporal_feature_extraction/extract_temporal_feature.py \
--dataset KoNViD1k \
--dataset_file data/KoNViD-1k_data.mat \
--videos_dir /data/sunwei_data/konvid1k \
--feature_save_folder /data/sunwei_data/video_data/KoNViD1k/temporal_feature_mid_sr_1 \
--sample_type mid \
--sample_rate 1 \
--resize 224 \
>> logs/extract_feature_KoNViD1k_temporal_feature_mid_sr_1.log
- Train the model:
CUDA_VISIBLE_DEVICES=0,1 python -u train_BVQA.py \
--dataset KoNViD1k \
--model_name Model_IX \
--datainfo data/KoNViD-1k_data.mat \
--videos_dir /data/sunwei_data/video_data/KoNViD1k/image_384p \
--lr 0.00001 \
--decay_ratio 0.9 \
--decay_interval 10 \
--print_samples 400 \
--train_batch_size 6 \
--num_workers 8 \
--resize 384 \
--crop_size 384 \
--epochs 30 \
--ckpt_path /data/sunwei_data/video_data/MinimalisticVQA_model/KoNViD1k/ \
--multi_gpu \
--n_exp 10 \
--sample_rate 1 \
--feature_dir /data/sunwei_data/video_data/KoNViD1k/temporal_feature_mid_sr_1 \
>> logs/train_BVQA_KoNViD1k_Model_IX.log
Download a trained model (e.g. Model XI and the scaling file (for quality rescaling) trained on LSVQ).
CUDA_VISIBLE_DEVICES=0 python -u test_video.py \
--model_path /data/sunwei_data/video_data/MinimalisticVQA_model/LSVQ/MinimalisticVQA_Model_IX_LSVQ.pth \ # your model file
--popt_path popt/LSVQ_Model_IX.npy \ # your popt file
--model_name Model_IX \
--video_name Zebra_Mussels_Not_Welcome_Here.mp4 \ # your video name
--video_path /data/sunwei_data/LSVQ/ia-batch1 \ # your video path
--resize 384 \
--crop_size 384 \
--video_number_min 8 \
--sample_rate 1 \
--sample_type mid \
--output logs/video_score.log \
--is_gpu
If you find this code is useful for your research, please cite:
@article{sun2024analysis,
title={Analysis of video quality datasets via design of minimalistic video quality models},
author={Sun, Wei and Wen, Wen and Min, Xiongkuo and Lan, Long and Zhai, Guangtao and Ma, Kede},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
publisher={IEEE}
}