Skip to content

milenakapralova/socraticmodels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Abstract Socratic models is a modular framework in which multiple pre-trained models are composed zeroshot via multimodal informed prompting. This is done to exchange information between models and capture new multimodal capabilities, without requiring finetuning. As a proof of concept, we modify the Socratic models framework such that it is entirely open-source and attempt to achieve the same results as the original version. Additionally, we investigate the capabilities of Socratic models on multimodal reasoning tasks such as chain-of-thought reasoning and visual question-answering in zeroshot and few-shot settings.

Authors: Ryan Amaudruz, Abhinav Bhuyan, Milena Kapralova, Bogdan Palfi, Alexandru Turcu

Install

To install the environment, run:

conda env create -f environment.yml
conda activate socratic
python -m spacy download en

Instructions

This repository provides scripts for CLIP with GPT-3, FLAN-T5, GitVision, BLIP and BLIP2 prompting, and self-contained ipython notebooks with prototype implementations of Socratic Models for image captioning geenration, chain-of-thought and visual question answering. The project was organised such that the downloading, caching and organisation of files is managed by the code. The classes were built in a modular fashion such that they could be adapted to different use-cases.

Notes on files in this repository

  • blogpost.md: the research corresponding to experiments documented in this repository

  • blogpost_images: contains images for the blogpost.md report

  • scripts

    • coco_caption_base.py - Run a train/valid/test dataset on the Baseline Image Captioner.
    • coco_caption_base_hp_tune.py - Run a parameter search on the Baseline Image Captioner.
    • coco_caption_imp.py - Run a train/valid/test dataset on the Improved Image Captioner.
    • coco_caption_imp_hp_tune.py - Run a parameter search on the Improved Image Captioner.
    • coco_caption_gpt.py - Run a train/valid/test dataset on the Original Socratic Captioner.
    • coco_caption_git.py - Run a train/valid/test dataset using GIT.
    • coco_caption_blip.py - Run a train/valid/test dataset using BLIP.
    • coco_caption_blip2.py - Run a train/valid/test dataset using BLIP2.
    • image_captioning.py - Contains the functionality relating to the image captioning.
    • mm_reasoning.py - Contains the functionality relating to the multimodal reasoning.
    • generate_reasoning.py - Run a reasoning task.
    • utils.py - Contains utilities functions.
    • coco_evaluation.py - Run the evalutaion of the captions that were generated using different approaches.
    • reasoning_evaluation.py - Run the multimodal reasoning evalutation.
  • notebooks

    • demo_baseline.ipynb - A demo of the Baseline Image Captioner in action.
    • demo_improved.ipynb - A demo of the Improved Image Captioner in action.
    • demo_gpt.ipynb - A demo of the Original Socratic Image Captioner in action.
    • demo_gitvision.ipynb - A demo of GIT in action.
    • demo_blip.ipynb - A demo of BLIP in action.
    • demo_blip2.ipynb - A demo of BLIP2 in action.
    • display_images_captions.ipynb - A display of a selection of captions that were obtained with the captioners.
    • visualise_CLIP.ipynb - Visualisations of the embedding space of CLIP.
    • socratic_mm_reasoning.ipynb - A showcase of the multimodal reasoning tasks.
  • data

    • The data directory stores the input and generated data. It is automatically created when the code is run.

License

This project is licensed under the terms of the MIT License, allowing free use of the code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published