Jinqi Xiao1,2
·
Shen Sang1
·
Tiancheng Zhi1
·
Jing Liu1
·
Qing Yan1
·
Yuqian Zhang2
·
Linjie Luo1
·
Bo Yuan2
1ByteDance Inc. 2Rutgers University
COAP (COrrelation-Aware Gradient Projection) is a memory-efficient training method that reduces computational overhead without sacrificing performance. Tested on vision, language, and multimodal tasks, COAP delivers faster training and better results than existing approaches—making it an ideal choice for scaling large models efficiently.
Comparison between COAP and other low-rank-based methods. The X-axis shows additional training time, with lower values being better. The Y-axis shows quantitative (e.g., FID, PPL) changes compared to the original optimizer (e.g., Adam, Adafactor) with higher values indicating better performance.
Profiling the GPU memory usage.
pip install -e .
We provide three examples (e.g. ControlNet-XL, DDPM, LLAMA) included in our main paper for reproducibility. Please refer to examples for more results.
Here are the main parameters for COAP:
optimizer
: The optimizer provided by COAP, includingcoap_adamw
,coap_adamw8bit
,coap_adafactor
,coap_adafactor8bit
.rank
: The rank of the projected matrix.rank_ratio_matrix
: The compression ratio of the 2D weight matrix (This will override therank
parameter).rank_ratio_cnn
: The compression ratio of the 4D weight matrix of CNN layers.update_interval
: The interval of updating the projection matrix.reproject_factor
: The factor of re-projection.
from coap_torch import CoapAdamW, CoapAdafactor
# AdamW
optimizer = AdamW(model.parameters(), lr=learning_rate)
# CoapAdamW
optimizer = CoapAdamW(params=model.parameters(),
lr=learning_rate,
rank_ratio_matrix=2,
rank_ratio_cnn=2,
update_interval=32,
reproject_factor=5)
# CoapAdafactor
optimizer = CoapAdafactor(params=model.parameters(),
lr=learning_rate,
rank_ratio_matrix=2,
rank_ratio_cnn=2,
update_interval=32,
reproject_factor=5)
Please refer to the DDPM and ControlNet-SDXL examples for basic usage. A more advanced use case can be found in the LLAMA example.
If you find COAP useful for your research and applications, please cite COAP using this BibTeX:
@misc{xiao2025coapmemoryefficienttrainingcorrelationaware,
title={COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection},
author={Jinqi Xiao and Shen Sang and Tiancheng Zhi and Jing Liu and Qing Yan and Yuqian Zhang and Linjie Luo and Bo Yuan},
year={2025},
eprint={2412.00071},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.00071},
}
Apache 2.0 License. See LICENSE for details.