GitHub - bytedance/coap: COAP is a memory-efficient training method that reduces computational overhead without sacrificing performance.

COAP: Memory-Efficient Training with Correlation-Aware
Gradient Projection

Jinqi Xiao^1,2 · Shen Sang¹ · Tiancheng Zhi¹ · Jing Liu¹ · Qing Yan¹ · Yuqian Zhang² · Linjie Luo¹ · Bo Yuan²

¹ByteDance Inc. ²Rutgers University

CVPR 2025

COAP (COrrelation-Aware Gradient Projection) is a memory-efficient training method that reduces computational overhead without sacrificing performance. Tested on vision, language, and multimodal tasks, COAP delivers faster training and better results than existing approaches—making it an ideal choice for scaling large models efficiently.

Comparison between COAP and other low-rank-based methods. The X-axis shows additional training time, with lower values being better. The Y-axis shows quantitative (e.g., FID, PPL) changes compared to the original optimizer (e.g., Adam, Adafactor) with higher values indicating better performance.

Profiling the GPU memory usage.

Installation

pip install -e .

Usage

Examples

We provide three examples (e.g. ControlNet-XL, DDPM, LLAMA) included in our main paper for reproducibility. Please refer to examples for more results.

How to use COAP in your code

Here are the main parameters for COAP:

optimizer: The optimizer provided by COAP, including coap_adamw, coap_adamw8bit, coap_adafactor, coap_adafactor8bit.
rank: The rank of the projected matrix.
rank_ratio_matrix: The compression ratio of the 2D weight matrix (This will override the rank parameter).
rank_ratio_cnn: The compression ratio of the 4D weight matrix of CNN layers.
update_interval: The interval of updating the projection matrix.
reproject_factor: The factor of re-projection.

from coap_torch import CoapAdamW, CoapAdafactor

# AdamW
optimizer = AdamW(model.parameters(), lr=learning_rate)

# CoapAdamW
optimizer = CoapAdamW(params=model.parameters(),
                      lr=learning_rate,
                      rank_ratio_matrix=2,
                      rank_ratio_cnn=2,
                      update_interval=32,
                      reproject_factor=5)

# CoapAdafactor
optimizer = CoapAdafactor(params=model.parameters(),
                          lr=learning_rate,
                          rank_ratio_matrix=2,
                          rank_ratio_cnn=2,
                          update_interval=32,
                          reproject_factor=5)

Please refer to the DDPM and ControlNet-SDXL examples for basic usage. A more advanced use case can be found in the LLAMA example.

BibTeX

If you find COAP useful for your research and applications, please cite COAP using this BibTeX:

@misc{xiao2025coapmemoryefficienttrainingcorrelationaware,
      title={COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection}, 
      author={Jinqi Xiao and Shen Sang and Tiancheng Zhi and Jing Liu and Qing Yan and Yuqian Zhang and Linjie Luo and Bo Yuan},
      year={2025},
      eprint={2412.00071},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2412.00071}, 
}

License

Apache 2.0 License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets/figures		assets/figures
coap_torch		coap_torch
examples		examples
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COAP: Memory-Efficient Training with Correlation-Aware
Gradient Projection

CVPR 2025

Installation

Usage

Examples

How to use COAP in your code

BibTeX

License

About

Releases

Packages

Contributors 2

Languages

License

bytedance/coap

Folders and files

Latest commit

History

Repository files navigation

COAP: Memory-Efficient Training with Correlation-AwareGradient Projection

CVPR 2025

Installation

Usage

Examples

How to use COAP in your code

BibTeX

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

COAP: Memory-Efficient Training with Correlation-Aware
Gradient Projection

Packages