Data and code for the paper "Coreference as an indicator of context scope in multimodal narrative" (GEM2 @ ACL 2025).
To run the scripts, you need only some basic packages. They are listed under environment.txt.
The subset of the VWP dataset used in this research is available as data/vwp-gem2-subset.csv.
Model-generated stories are available under data/model-generated-stories. Each .parquetfile also contains the corresponding human-generated story.
The input files to run LinkAppend coreference system are available under data/link-append/in. We used implementation of the system available at https://github.com/ianporada/coref-reeval. The outputs of the LinkAppend runs are available under data/link-append/out.
Prompts are available under data/prompts. Data on which character appears on which image in all stories is available under data/visual-continuityin the form of several .csv files.
In all data files story_id column is what links extracted stories with original stories from VWP.
To compute general descriptive statistics for both machine-/ and human-generated texts alongside quantitative metrics, run
python main.py --results-path ../data/link-append/out/ --output-path ../results/metrics/ --character-stories ../data/character_stories.json
To compute MCC metric, run
python mcc_metric.py --output-path ../results/mcc
To compute correlation between character change metric and MCC (Table 4 in the paper), run
python correlate_metrics.py --output-path ../results/correlation --character-stories ../data/character_stories.json
If you find our data useful, please cite
The poster presented at GEM2 can be found here.