$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ping Luo, Ying Shan

This repo is the official implementation of the paper $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation .

News

[2023.04] Our paper is accepted by ICML 2023.
[2023.07] The official code is released.

Main Results

Vision-Language Benchmarks

Vision Benchmarks

Language Benchmarks

Instruction

Dataset and Checkpoints Preparation

See datasets.md for dataset preparation. As for the checkpoints, please see checkpoints.

Installation

pip install -r OFA/requirements.txt

Training and Evaluation

We use NVIDIA A100 GPUs for training and evaluation. The detailed hyper-parameters can be found in the Appendix.

Step 1: PETL training

We provide several demo scripts that have all the required parts for PETL training:

OFA/run_scripts/refcoco/train_refcoco_adapter.sh
OFA/run_scripts/refcoco/train_refcoco_prefix.sh
OFA/run_scripts/refcoco/train_refcoco_lora.sh

Usage:

cd OFA
bash ./run_scripts/refcoco/train_refcoco_adapter.sh

A few options of note:

--encoder-prompt :: whether to insert prompts to the encoder
--decoder-prompt :: whether to insert prompts to the decoder
--encoder-prompt-length :: encoder prompt length
--decoder-prompt-length :: decoder prompt length
--bitfit :: whether to use bitfit
--adapter :: whether to use adapter
--adapter-dim :: adapter projection dim
--lora :: whether to use lora
--lora-r :: lora rank

Step 2: Task similarity measurement

We provide a demo script to calculate task embedding of RefCOCO based on Fisher Information Matrix (FIM) with diagonal approximation: OFA/run_scripts/refcoco/refcoco_task_emb.sh

Usage:

cd OFA
bash ./run_scripts/refcoco/refcoco_task_emb.sh

A few options of note:

--task-emb :: task embedding calculation
--task-emb-file-path :: directory to save task embedding result (we recommend to save it under OFA/results/task_name/)

After obtaining the embedding of each task, use the task_emb_post_process.ipynb to calculate the similarity of tasks.

Step 3: Expert interpolation

We provide a demo script to interpolate 3 experts (RefCOCO, RefCOCO+, RefCOCOg) for the target task, RefCOCO: OFA/run_scripts/refcoco/train_refcoco_adapter_interpolation.sh

Usage:

cd OFA
bash ./run_scripts/refcoco/train_refcoco_adapter_interpolation.sh

Evaluation

After the above steps, you can use OFA/run_scripts/refcoco/evaluate_refcoco.sh to evaluate the final checkpoint. Remember to change the path of checkpoint in the script.

Usage:

cd OFA
bash ./run_scripts/refcoco/evaluate_refcoco.sh

We recommend that your workspace directory should be organized like this:

OFA/
├── checkpoints/
│   ├── ofa_base.pt
│   ├── ofa_large.pt
│   └── ...
├── criterions/
├── data/
├── dataset/
│   ├── caption_data/
│   ├── refcoco_data/
│   └── ...
├── fairseq/
├── models/
├── run_scripts/
├── tasks/
├── train.py
├── trainer.py
└── utils/

Acknowledgement

The code is based on the official implementation of OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

License

This research paper makes references to some open-source projects. Credits are given to these projects. See License.txt for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
OFA		OFA
imgs		imgs
.gitignore		.gitignore
License.txt		License.txt
README.md		README.md
checkpoints.md		checkpoints.md
datasets.md		datasets.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

News

Main Results

Vision-Language Benchmarks

Vision Benchmarks

Language Benchmarks

Instruction

Dataset and Checkpoints Preparation

Installation

Training and Evaluation

Step 1: PETL training

Step 2: Task similarity measurement

Step 3: Expert interpolation

Evaluation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

TencentARC/pi-Tuning

Folders and files

Latest commit

History

Repository files navigation

$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

News

Main Results

Vision-Language Benchmarks

Vision Benchmarks

Language Benchmarks

Instruction

Dataset and Checkpoints Preparation

Installation

Training and Evaluation

Step 1: PETL training

Step 2: Task similarity measurement

Step 3: Expert interpolation

Evaluation

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages