Data and code for the paper "Coreference as an indicator of context scope in multimodal narrative" (GEM2 @ ACL 2025).
To run the scripts, you need only some basic packages. They are listed under environment.txt
.
The subset of the VWP dataset used in this research is available as data/vwp-gem2-subset.csv
.
Model-generated stories are available under data/model-generated-stories
. Each .parquet
file also contains the corresponding human-generated story.
The input files to run LinkAppend coreference system are available under data/link-append/in
. We used implementation of the system available at https://github.com/ianporada/coref-reeval
. The outputs of the LinkAppend runs are available under data/link-append/out
.
Prompts are available under data/prompts
. Data on which character appears on which image in all stories is available under data/visual-continuity
in the form of several .csv
files.
In all data files story_id
column is what links extracted stories with original stories from VWP.
To compute general descriptive statistics for both machine-/ and human-generated texts alongside quantitative metrics, run
python main.py --results-path ../data/link-append/out/ --output-path ../results/metrics/ --character-stories ../data/character_stories.json
To compute MCC metric, run
python mcc_metric.py --output-path ../results/mcc
To compute correlation between character change metric and MCC (Table 4 in the paper), run
python correlate_metrics.py --output-path ../results/correlation --character-stories ../data/character_stories.json
If you find our data useful, please cite
The poster presented at GEM2 can be found here.