Skip to content

The official repository of VinaBench: Benchmark for Faithful and Consistent Visual Narratives.

License

Notifications You must be signed in to change notification settings

epfl-nlp/VinaBench

 
 

Repository files navigation

icon

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Abstract

Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to the lack of knowledge constraints used for planning the stories. In this work, we propose a new benchmark, VinaBench, to address this challenge. Our benchmark annotates the underlying commonsense and discourse constraints in visual narrative samples, offering systematic scaffolds for learning the implicit strategies of visual storytelling. Based on the incorporated narrative constraints, we further propose novel metrics to closely evaluate the consistency of generated narrative images and the alignment of generations with the input textual narrative. Our results across three generative vision models demonstrate that learning with VinaBench's knowledge constraints effectively improves the faithfulness and cohesion of generated visual narratives.

Overview of VinaBench

overview

We augment existing visual-textual narrative pairs with discourse and commonsense constraints, to promote the learning of consistent and faithful visual narrative generation and its evaluation.


Getting Started

VinaBench environments are developed based on Ubuntu 22.04, CUDA 12.1, Python 3.10 and Conda.

Scripts for setting up environments:

# for training visual narrative baselines
bash setup_baseline.sh

# for Mantis-Idefics2
bash setup_mantis.sh

# for Llama-3.1-70B-Instruct
bash setup_llama.sh

# for Llama-OneVision-72B
bash setup_llava_onev.sh

# for MiniCPM-V-2.6
bash setup_minicpm.sh

# for training LLM narrative constraint generators
bash setup_torchtune.sh

Preparing VinaBench Data

Please follow data/README.md to prepare the VinaBench data.

VinaBench Baseline Training and Inference

We have tested three baseline models on VinaBench:

VinaBench Evaluation

Please follow evaluation/README.md to perform VinaBench evaluation.

Citation

@inproceedings{gao2025vinabench,
  title={VinaBench: Benchmark for Faithful and Consistent Visual Narratives},
  author={Gao, Silin and Mathew, Sheryl and Mi, Li and Mamooler, Sepideh and Zhao, Mengjie and Wakaki, Hiromi and Mitsufuji, Yuki and Montariol, Syrielle and Bosselut, Antoine},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

About

The official repository of VinaBench: Benchmark for Faithful and Consistent Visual Narratives.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.7%
  • Shell 3.6%
  • Cuda 2.5%
  • C++ 0.2%