Skip to content
/ coap Public

COAP is a memory-efficient training method that reduces computational overhead without sacrificing performance.

License

Notifications You must be signed in to change notification settings

bytedance/coap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COAP: Memory-Efficient Training with Correlation-Aware
Gradient Projection

Jinqi Xiao1,2 · Shen Sang1 · Tiancheng Zhi1 · Jing Liu1 · Qing Yan1 · Yuqian Zhang2 · Linjie Luo1 · Bo Yuan2

1ByteDance Inc.  2Rutgers University  

Paper PDF Project Page

CVPR 2025

COAP (COrrelation-Aware Gradient Projection) is a memory-efficient training method that reduces computational overhead without sacrificing performance. Tested on vision, language, and multimodal tasks, COAP delivers faster training and better results than existing approaches—making it an ideal choice for scaling large models efficiently.


COAP Performance
Comparison between COAP and other low-rank-based methods. The X-axis shows additional training time, with lower values being better. The Y-axis shows quantitative (e.g., FID, PPL) changes compared to the original optimizer (e.g., Adam, Adafactor) with higher values indicating better performance.


COAP Memory
Profiling the GPU memory usage.


Installation

pip install -e .

Usage

Examples

We provide three examples (e.g. ControlNet-XL, DDPM, LLAMA) included in our main paper for reproducibility. Please refer to examples for more results.

How to use COAP in your code

Here are the main parameters for COAP:

  • optimizer: The optimizer provided by COAP, including coap_adamw, coap_adamw8bit, coap_adafactor, coap_adafactor8bit.
  • rank: The rank of the projected matrix.
  • rank_ratio_matrix: The compression ratio of the 2D weight matrix (This will override the rank parameter).
  • rank_ratio_cnn: The compression ratio of the 4D weight matrix of CNN layers.
  • update_interval: The interval of updating the projection matrix.
  • reproject_factor: The factor of re-projection.
from coap_torch import CoapAdamW, CoapAdafactor

# AdamW
optimizer = AdamW(model.parameters(), lr=learning_rate)

# CoapAdamW
optimizer = CoapAdamW(params=model.parameters(),
                      lr=learning_rate,
                      rank_ratio_matrix=2,
                      rank_ratio_cnn=2,
                      update_interval=32,
                      reproject_factor=5)

# CoapAdafactor
optimizer = CoapAdafactor(params=model.parameters(),
                          lr=learning_rate,
                          rank_ratio_matrix=2,
                          rank_ratio_cnn=2,
                          update_interval=32,
                          reproject_factor=5)

Please refer to the DDPM and ControlNet-SDXL examples for basic usage. A more advanced use case can be found in the LLAMA example.

BibTeX

If you find COAP useful for your research and applications, please cite COAP using this BibTeX:

@misc{xiao2025coapmemoryefficienttrainingcorrelationaware,
      title={COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection}, 
      author={Jinqi Xiao and Shen Sang and Tiancheng Zhi and Jing Liu and Qing Yan and Yuqian Zhang and Linjie Luo and Bo Yuan},
      year={2025},
      eprint={2412.00071},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2412.00071}, 
}

License

Apache 2.0 License. See LICENSE for details.

About

COAP is a memory-efficient training method that reduces computational overhead without sacrificing performance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages