CD-GraB aims to find a distributed data permutation with provably better convergence guarantees than Distributed Random Reshuffling (D-RR) based on the gradient balancing frameworks introduced in the original GraB paper. The technical details can be found in our NeurIPS'23 paper. Please contact Wentao Guo if you have any questions or suggestions on the paper / code: [email protected].
Python >= 3.9
PyTorch >= 2.0.0
CUDA >= 11.7 on linux
torchopt
torchvision
functorch
transformers
GraB repository: https://github.com/EugeneLYC/GraB
Please run the following command for CD-GraB
torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LR-HMDA.py --sorter CD-GraB --seed 0 --lr 5e-3 --node_cnt 4
and the following command for D-RR
torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LR-HMDA.py --sorter D-RR --seed 0 --lr 5e-3 --node_cnt 4
Please run the following command for CD-GraB
torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LSTM-Wiki2.py --sorter CD-GraB --seed 0 --lr 5.0 --B 16 --node_cnt 4
and the following command for D-RR
torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LSTM-Wiki2.py --sorter D-RR --seed 0 --lr 5.0 --B 16 --node_cnt 4
Please run the following command for CD-GraB
torchrun --nproc_per_node=32 --nnodes=1 --master_addr="localhost" --master_port=35500 main-MLP-M4.py --sorter CD-GraB --seed 0 --node_cnt 32 --backend gloo
and the following command for D-RR
torchrun --nproc_per_node=32 --nnodes=1 --master_addr="localhost" --master_port=35500 main-MLP-M4.py --sorter D-RR --seed 0 --B 32 --node_cnt 32 --backend gloo
Please run the following command for CD-GraB
python main-GPT2-Wiki103.py --sorter CD-GraB --seed 0
and the following command for D-RR
python main-GPT2-Wiki103.py --sorter D-RR --seed 0
- Wentao Guo (Cofirst author), [email protected]
- A. Feder Cooper (Cofirst author), [email protected]
- Khiem Pham (Cofirst author), [email protected]
- Tiancheng Yuan, [email protected]
- Charlie F. Ruan, [email protected]
- Yucheng Lu, [email protected]
- Christopher De Sa, [email protected]
CD-GraB uses Apache-2 license in the LICENSE file.
A. Feder Cooper is supported by Christopher De Sa's NSF CAREER grant, and in part by the Artificial Intelligence Policy and Practice initiative at Cornell University and the John D. and Catherine T. MacArthur Foundation. Yucheng Lu is supported by Meta Ph.D. Fellowship. We also acknowledge a gift from SambaNova Systems. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.
If you find CD-GraB helpful in your research, please consider citing us:
@inproceedings{
cooper2023cdgrab,
title={CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training},
author={A. Feder Cooper and Wentao Guo and Khiem Pham and Tiancheng Yuan and Charlie F. Ruan and Yucheng Lu and Christopher De Sa},
booktitle={Advances in Neural Information Processing Systems},
year={2023},
url={https://arxiv.org/pdf/2302.00845.pdf}
}
@inproceedings{
lu2022grab,
title={GraB: Finding Provably Better Data Permutations than Random Reshuffling},
author={Yucheng Lu and Wentao Guo and Christopher De Sa},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=nDemfqKHTpK}
}
@inproceedings{
lu2022a,
title={A General Analysis of Example-Selection for Stochastic Gradient Descent},
author={Yucheng Lu and Si Yi Meng and Christopher De Sa},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=7gWSJrP3opB}
}