Skip to content

Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.

License

Notifications You must be signed in to change notification settings

TencentARC/pi-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ping Luo, Ying Shan

This repo is the official implementation of the paper $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation .

Overview

News

  • [2023.04] Our paper is accepted by ICML 2023.

  • [2023.07] The official code is released.

Main Results

Vision-Language Benchmarks

Tab1

Vision Benchmarks

Tab2

Language Benchmarks

Tab3

Instruction

Dataset and Checkpoints Preparation

See datasets.md for dataset preparation. As for the checkpoints, please see checkpoints.

Installation

pip install -r OFA/requirements.txt

Training and Evaluation

We use NVIDIA A100 GPUs for training and evaluation. The detailed hyper-parameters can be found in the Appendix.

Step 1: PETL training

We provide several demo scripts that have all the required parts for PETL training:

  • OFA/run_scripts/refcoco/train_refcoco_adapter.sh
  • OFA/run_scripts/refcoco/train_refcoco_prefix.sh
  • OFA/run_scripts/refcoco/train_refcoco_lora.sh

Usage:

cd OFA
bash ./run_scripts/refcoco/train_refcoco_adapter.sh

A few options of note:

  • --encoder-prompt :: whether to insert prompts to the encoder
  • --decoder-prompt :: whether to insert prompts to the decoder
  • --encoder-prompt-length :: encoder prompt length
  • --decoder-prompt-length :: decoder prompt length
  • --bitfit :: whether to use bitfit
  • --adapter :: whether to use adapter
  • --adapter-dim :: adapter projection dim
  • --lora :: whether to use lora
  • --lora-r :: lora rank

Step 2: Task similarity measurement

We provide a demo script to calculate task embedding of RefCOCO based on Fisher Information Matrix (FIM) with diagonal approximation: OFA/run_scripts/refcoco/refcoco_task_emb.sh

Usage:

cd OFA
bash ./run_scripts/refcoco/refcoco_task_emb.sh

A few options of note:

  • --task-emb :: task embedding calculation
  • --task-emb-file-path :: directory to save task embedding result (we recommend to save it under OFA/results/task_name/)

After obtaining the embedding of each task, use the task_emb_post_process.ipynb to calculate the similarity of tasks.

Step 3: Expert interpolation

We provide a demo script to interpolate 3 experts (RefCOCO, RefCOCO+, RefCOCOg) for the target task, RefCOCO: OFA/run_scripts/refcoco/train_refcoco_adapter_interpolation.sh

Usage:

cd OFA
bash ./run_scripts/refcoco/train_refcoco_adapter_interpolation.sh

Evaluation

After the above steps, you can use OFA/run_scripts/refcoco/evaluate_refcoco.sh to evaluate the final checkpoint. Remember to change the path of checkpoint in the script.

Usage:

cd OFA
bash ./run_scripts/refcoco/evaluate_refcoco.sh

We recommend that your workspace directory should be organized like this:

OFA/
├── checkpoints/
│   ├── ofa_base.pt
│   ├── ofa_large.pt
│   └── ...
├── criterions/
├── data/
├── dataset/
│   ├── caption_data/
│   ├── refcoco_data/
│   └── ...
├── fairseq/
├── models/
├── run_scripts/
├── tasks/
├── train.py
├── trainer.py
└── utils/

Acknowledgement

The code is based on the official implementation of OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

License

This research paper makes references to some open-source projects. Credits are given to these projects. See License.txt for details.

About

Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published