CrossModalTransferLearning

Repository for the paper "Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection".

Fine-tuning Pipeline

This work explores using meme datasets to fine-tune models for hateful video detection, addressing data scarcity in video datasets. Initial experiments revealed performance gaps due to inconsistencies in label definitions. To mitigate this, we implemented a model-prediction-driven, human-assisted re-annotation pipeline, aligning meme labels with video dataset standards. The pipeline integrates meme dataset labels, model predictions via few-shot prompting, and human annotations through majority voting. Fine-tuning on re-annotated meme datasets achieved performance comparable to video-trained models, demonstrating their utility as substitutes. We utilized Facebook Hateful Memes (FHM) and MAMI datasets as sources and targeted MultiHateClip (MHC) and HateMM as video datasets, fine-tuning large vision-language models, LLaMA-3.2-11B and LLaVA-Next-Video-7B.

Re-annotation Pipeline

The re-annotation process combines original meme labels, model predictions, and human annotations to align with video label definitions. Predictions were generated using state-of-the-art vision-language models (e.g., LLaMA-3.2-11B) with few-shot prompting, where memes were queried using structured prompts without labels for prediction. Final labels were determined via majority voting to ensure consistency with video datasets.

Substitution Results

Fine-tuning on re-annotated memes (RM-FT) significantly outperformed both no fine-tuning (No-FT) and fine-tuning on original memes (OM-FT), achieving parity with video fine-tuning (Vid-FT).

Augmentation Results

Combining video data with re-annotated memes for fine-tuning consistently outperformed standalone methods (Vid-FT, RM-FT) and prior benchmarks, highlighting memes as effective substitutes and augmentations for video datasets in data-scarce scenarios.

Resources

Re-annotated Dataset: Available in dataset/re-annotated_meme.
Scripts: Fine-tuning and inference scripts using LoRA are provided in the script directory.

Citation

If you find this dataset useful, please cite:

@inproceedings{M2VTransfer:conf/www/2025,
  author       = {Han Wang, Tan Rui Yang, Roy Ka-Wei Lee},
  title        = {Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection},
  booktitle    = {Proceedings of the ACM Web Conference 2025},
  publisher    = {{ACM}},
  year         = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataset		dataset
images		images
script		script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrossModalTransferLearning

Fine-tuning Pipeline

Re-annotation Pipeline

Substitution Results

Augmentation Results

Resources

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CrossModalTransferLearning

Fine-tuning Pipeline

Re-annotation Pipeline

Substitution Results

Augmentation Results

Resources

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages