[NeurIPS'23] CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

CD-GraB aims to find a distributed data permutation with provably better convergence guarantees than Distributed Random Reshuffling (D-RR) based on the gradient balancing frameworks introduced in the original GraB paper. The technical details can be found in our NeurIPS'23 paper. Please contact Wentao Guo if you have any questions or suggestions on the paper / code: [email protected].

Requirements

Python >= 3.9

PyTorch >= 2.0.0

CUDA >= 11.7 on linux

torchopt

torchvision

functorch

transformers

Experiments

All generated plots in the paper can be found under notebooks directory.

GraB repository: https://github.com/EugeneLYC/GraB

Logistic regression on HMDA

Please run the following command for CD-GraB

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LR-HMDA.py --sorter CD-GraB --seed 0 --lr 5e-3 --node_cnt 4

and the following command for D-RR

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LR-HMDA.py --sorter D-RR --seed 0 --lr 5e-3 --node_cnt 4

LSTM on Wiki2

Please run the following command for CD-GraB

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LSTM-Wiki2.py --sorter CD-GraB --seed 0 --lr 5.0 --B 16 --node_cnt 4

and the following command for D-RR

torchrun --nproc_per_node=4 --nnodes=1 --master_addr="localhost" --master_port=35500 main-LSTM-Wiki2.py --sorter D-RR --seed 0 --lr 5.0 --B 16 --node_cnt 4

Autoregressive MLP on M4

Please run the following command for CD-GraB

torchrun --nproc_per_node=32 --nnodes=1 --master_addr="localhost" --master_port=35500 main-MLP-M4.py --sorter CD-GraB --seed 0 --node_cnt 32 --backend gloo

and the following command for D-RR

torchrun --nproc_per_node=32 --nnodes=1 --master_addr="localhost" --master_port=35500 main-MLP-M4.py --sorter D-RR --seed 0 --B 32 --node_cnt 32 --backend gloo

Simulated tiny GPT2 pretraining on WikiText-103

Please run the following command for CD-GraB

python main-GPT2-Wiki103.py --sorter CD-GraB --seed 0

and the following command for D-RR

python main-GPT2-Wiki103.py --sorter D-RR --seed 0

Authors

Wentao Guo (Cofirst author), [email protected]
A. Feder Cooper (Cofirst author), [email protected]
Khiem Pham (Cofirst author), [email protected]
Tiancheng Yuan, [email protected]
Charlie F. Ruan, [email protected]
Yucheng Lu, [email protected]
Christopher De Sa, [email protected]

License

CD-GraB uses Apache-2 license in the LICENSE file.

Acknowledgement

A. Feder Cooper is supported by Christopher De Sa's NSF CAREER grant, and in part by the Artificial Intelligence Policy and Practice initiative at Cornell University and the John D. and Catherine T. MacArthur Foundation. Yucheng Lu is supported by Meta Ph.D. Fellowship. We also acknowledge a gift from SambaNova Systems. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Cite us

If you find CD-GraB helpful in your research, please consider citing us:

@inproceedings{
  cooper2023cdgrab,
  title={CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training},
  author={A. Feder Cooper and Wentao Guo and Khiem Pham and Tiancheng Yuan and Charlie F. Ruan and Yucheng Lu and Christopher De Sa},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023},
  url={https://arxiv.org/pdf/2302.00845.pdf}
}

@inproceedings{
    lu2022grab,
    title={GraB: Finding Provably Better Data Permutations than Random Reshuffling},
    author={Yucheng Lu and Wentao Guo and Christopher De Sa},
    booktitle={Advances in Neural Information Processing Systems},
    editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
    year={2022},
    url={https://openreview.net/forum?id=nDemfqKHTpK}
}

@inproceedings{
    lu2022a,
    title={A General Analysis of Example-Selection for Stochastic Gradient Descent},
    author={Yucheng Lu and Si Yi Meng and Christopher De Sa},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=7gWSJrP3opB}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data/HMDA		data/HMDA
notebooks		notebooks
.gitignore		.gitignore
CD-GraB.png		CD-GraB.png
LICENSE		LICENSE
M4_generator.py		M4_generator.py
algo.py		algo.py
d_algo.py		d_algo.py
d_cv_train.py		d_cv_train.py
d_data.py		d_data.py
d_eventTimer.py		d_eventTimer.py
d_hmda.py		d_hmda.py
d_lm_data.py		d_lm_data.py
d_lm_train.py		d_lm_train.py
d_model.py		d_model.py
d_time_series_train.py		d_time_series_train.py
d_utils.py		d_utils.py
huggingface_pt.py		huggingface_pt.py
main-GPT2-Wiki103.py		main-GPT2-Wiki103.py
main-LR-HMDA.py		main-LR-HMDA.py
main-LSTM-Wiki2.py		main-LSTM-Wiki2.py
main-MLP-M4.py		main-MLP-M4.py
readme.md		readme.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS'23] CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

Requirements

Experiments

All generated plots in the paper can be found under notebooks directory.

Logistic regression on HMDA

LSTM on Wiki2

Autoregressive MLP on M4

Simulated tiny GPT2 pretraining on WikiText-103

Authors

License

Acknowledgement

Cite us

About

Releases

Packages

Contributors 2

Languages

License

GarlGuo/CD-GraB

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS'23] CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

Requirements

Experiments

All generated plots in the paper can be found under notebooks directory.

Logistic regression on HMDA

LSTM on Wiki2

Autoregressive MLP on M4

Simulated tiny GPT2 pretraining on WikiText-103

Authors

License

Acknowledgement

Cite us

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages