Enable DDP for Off-Policy training branch #1099

breakds · 2021-12-01T23:13:59Z

Motivation

Previously DDP (data distributed parallel) is only enabled for the on-policy training branch (i.e. train_from_unroll()). However, most of the algorithms actually runs on the off-policy training branch (i.e. train_from_replay_buffer()) which does not enjoy DDP yet.

Solution

After #1098 we are ready to make this work for non-composite off-policy branch training, (e.g. standard PPO). According to this, we need wrap the computation that involves all trainable parameters. Therefore, the change is mainly:

Put calls to train_step() (i.e. the collection_train_info...()) and computation of loss into a standalone method of Algorithm, called _compute_train_info_and_loss_info.
Wrap _compute_train_info_and_loss_info with @data_distributed.

Also, a special treatment is done to ignore replay buffer from participating DDP's synchronization.

Testing

Tested with

python -m alf.bin.train --conf alf/examples/ppo_procgen/bossfight_conf.py --root_dir ~/tmp/alf_sessions/ppo_procgen/ddp --distributed multi-gpu

(Note that mini_batch_size is halved)

Still under training but the throughput is almost doubled (13000 vs 7000) on a machine with two 3080s.

Also verified on ppo_cart_pole as a small experiment, see here.

alf/utils/distributed.py

breakds · 2021-12-02T23:42:45Z

Addressed comments. PTAL.

alf/utils/distributed.py

* DDP for off-policy training branch * More reliable way to ignore replay buffer's buffers * Use set instead of list

breakds requested review from emailweixu and hnyu December 1, 2021 23:13

breakds mentioned this pull request Dec 1, 2021

Multi-GPU Training with DDP #1096

Open

15 tasks

emailweixu reviewed Dec 2, 2021

View reviewed changes

alf/utils/distributed.py Outdated Show resolved Hide resolved

Base automatically changed from PR/breakds/data_distributed_decorator to pytorch December 2, 2021 22:08

breakds force-pushed the PR/breakds/data_distrbuted_off_policy branch from eecfb67 to 58496fa Compare December 2, 2021 23:40

emailweixu reviewed Dec 3, 2021

View reviewed changes

alf/utils/distributed.py Outdated Show resolved Hide resolved

breakds added 3 commits December 3, 2021 15:39

DDP for off-policy training branch

463651b

More reliable way to ignore replay buffer's buffers

65ccb9f

Use set instead of list

e943e5c

breakds force-pushed the PR/breakds/data_distrbuted_off_policy branch from ef9064d to e943e5c Compare December 3, 2021 23:40

emailweixu approved these changes Dec 4, 2021

View reviewed changes

breakds merged commit ee1e020 into pytorch Dec 5, 2021

breakds deleted the PR/breakds/data_distrbuted_off_policy branch December 5, 2021 06:15

breakds mentioned this pull request Dec 5, 2021

Ignore unused parameters when using DDP + PPG #1117

Merged

pd-perry pushed a commit to pd-perry/alf that referenced this pull request Dec 11, 2021

Enable DDP for Off-Policy training branch (HorizonRobotics#1099)

13042a0

* DDP for off-policy training branch * More reliable way to ignore replay buffer's buffers * Use set instead of list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable DDP for Off-Policy training branch #1099

Enable DDP for Off-Policy training branch #1099

breakds commented Dec 1, 2021

breakds commented Dec 2, 2021

Enable DDP for Off-Policy training branch #1099

Enable DDP for Off-Policy training branch #1099

Conversation

breakds commented Dec 1, 2021

Motivation

Solution

Testing

breakds commented Dec 2, 2021