GitHub

Code for the paper A Modular Approach for Multimodal Summarization of TV Shows, applied to the SummScreen3D dataset.

Requirements

If you use mamba, you can install the required dependencies with mamba env create -f environment.yml. Otherwise, the required packages are listed in manual_requirements.sh and can be installed with . manual_requirements.sh.

Data

The dataset consists of text transcripts and videos for tv show episodes, which are to be summarized as text. Our method first converts the videos to text as video captions.

Precomputed Captions

We include the video captions in this repo so you can run our model without access to the videos themselves. The captions are at

SummScreen/{episode-name}/{caption-method}_procced_scene_caps.json,

where caption-method is one of 'kosmos' or 'swinbert', the two methods used in the paper.

From Scratch

If you want to process the videos from scratch, you can download them (>600GB) from https://github.com/ppapalampidi/long_video_summarization, and use the authors' public code for kosmos or swinbert. In that case, you must first split the videos into scenes, and then produce captions for each scene. To do this, use, in order preproc/align_vid_and_transcript.py, preproc/frame_extractor.py (for kosmos) and preproc/caption_each_scene.py.

Model

To reproduce our model output, run

python train.py --caps kosmos --order optimal --n_epochs 10.

PRISMA Metric

Our paper also introduces a new metric for factual precision and recall evaluation of summaries. This method makes multiple api calls to GPT4, and expects your openai api key at prefs/api.key. To run it as used the in paper, assuming your output summaries are at experiments/{experiment-name}/generations_test, with each summary in a separate file, run

python compute_metrics.py --expname {experiment-name}.

If you want to use PRISMA in a different context, you can follow the example in example_prefs.py.

Citation

@inproceedings{mahon-lapata-2024-modular,
    title = "A Modular Approach for Multimodal Summarization of {TV} Shows",
    author = "Mahon, Louis  and
      Lapata, Mirella",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.450",
    doi = "10.18653/v1/2024.acl-long.450",
    pages = "8272--8291",
  }

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
SummScreen		SummScreen
experiments/kosmos_reordered		experiments/kosmos_reordered
prefs		prefs
preproc		preproc
README.md		README.md
compute_metrics.py		compute_metrics.py
dset_info.csv		dset_info.csv
environment.yml		environment.yml
episode.py		episode.py
example_prefs.py		example_prefs.py
manual_requirements.sh		manual_requirements.sh
reorder.py		reorder.py
summarize_dialogue.py		summarize_dialogue.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Requirements

Data

Precomputed Captions

From Scratch

Model

PRISMA Metric

Citation

About

Uh oh!

Releases

Packages

Languages

Lou1sM/modular_multimodal_summarization

Folders and files

Latest commit

History

Repository files navigation

Requirements

Data

Precomputed Captions

From Scratch

Model

PRISMA Metric

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages