Multi-Modal NLP Model Comparison #35

Cgarg9 · 2025-03-14T16:05:56Z

Description:

To help users understand how multi-modal models process text along with other data types (e.g., images, audio), add a notebook that compares different multi-modal NLP techniques.

Tasks:

Compare CLIP (Contrastive Language-Image Pretraining), BLIP, Flamingo, and OpenAI’s GPT-4V.
Apply models to text-to-image retrieval, image captioning, and multi-modal reasoning tasks.
Evaluate results using BLEU, CIDEr, and retrieval precision metrics.
Summarize key takeaways for different applications.
Name the notebook multi_modal_nlp_comparison.ipynb.
Update the README file with relevant references.

Cgarg9 added hard pwoc labels Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Modal NLP Model Comparison #35

Multi-Modal NLP Model Comparison #35

Cgarg9 commented Mar 14, 2025

Multi-Modal NLP Model Comparison #35

Multi-Modal NLP Model Comparison #35

Comments

Cgarg9 commented Mar 14, 2025

Description:

Tasks: