VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Silin Gao¹, Sheryl Mathew^1,3, Li Mi¹, Sepideh Mamooler¹, Mengjie Zhao², Hiromi Wakaki², Yuki Mitsufuji², Syrielle Montariol¹, Antoine Bosselut¹

¹EPFL ²Sony ³CMU

Abstract

Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to the lack of knowledge constraints used for planning the stories. In this work, we propose a new benchmark, VinaBench, to address this challenge. Our benchmark annotates the underlying commonsense and discourse constraints in visual narrative samples, offering systematic scaffolds for learning the implicit strategies of visual storytelling. Based on the incorporated narrative constraints, we further propose novel metrics to closely evaluate the consistency of generated narrative images and the alignment of generations with the input textual narrative. Our results across three generative vision models demonstrate that learning with VinaBench's knowledge constraints effectively improves the faithfulness and cohesion of generated visual narratives.

Overview of VinaBench

We augment existing visual-textual narrative pairs with discourse and commonsense constraints, to promote the learning of consistent and faithful visual narrative generation and its evaluation.

Getting Started

VinaBench environments are developed based on Ubuntu 22.04, CUDA 12.1, Python 3.10 and Conda.

Scripts for setting up environments:

# for training visual narrative baselines
bash setup_baseline.sh

# for Mantis-Idefics2
bash setup_mantis.sh

# for Llama-3.1-70B-Instruct
bash setup_llama.sh

# for Llama-OneVision-72B
bash setup_llava_onev.sh

# for MiniCPM-V-2.6
bash setup_minicpm.sh

# for training LLM narrative constraint generators
bash setup_torchtune.sh

Preparing VinaBench Data

Please follow data/README.md to prepare the VinaBench data.

VinaBench Baseline Training and Inference

We have tested three baseline models on VinaBench:

MM-Interleaved: please follow MM-Interleaved/README.md
AR-LDM: (coming soon)
StoryGen: (coming soon)

VinaBench Evaluation

Please follow evaluation/README.md to perform VinaBench evaluation.

Citation

@inproceedings{gao2025vinabench,
  title={VinaBench: Benchmark for Faithful and Consistent Visual Narratives},
  author={Gao, Silin and Mathew, Sheryl and Mi, Li and Mamooler, Sepideh and Zhao, Mengjie and Wakaki, Hiromi and Mitsufuji, Yuki and Montariol, Syrielle and Bosselut, Antoine},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Abstract

Overview of VinaBench

Getting Started

Scripts for setting up environments:

Preparing VinaBench Data

VinaBench Baseline Training and Inference

VinaBench Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LLaVA-NeXT		LLaVA-NeXT
MM-Interleaved		MM-Interleaved
data		data
evaluation		evaluation
figs		figs
.DS_Store		.DS_Store
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements_baseline.txt		requirements_baseline.txt
requirements_minicpm.txt		requirements_minicpm.txt
setup_baseline.sh		setup_baseline.sh
setup_llama.sh		setup_llama.sh
setup_llava_onev.sh		setup_llava_onev.sh
setup_mantis.sh		setup_mantis.sh
setup_minicpm.sh		setup_minicpm.sh
setup_torchtune.sh		setup_torchtune.sh

License

epfl-nlp/VinaBench

Folders and files

Latest commit

History

Repository files navigation

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Abstract

Overview of VinaBench

Getting Started

Scripts for setting up environments:

Preparing VinaBench Data

VinaBench Baseline Training and Inference

VinaBench Evaluation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages