Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable DDP for Off-Policy training branch #1099

Merged
merged 3 commits into from
Dec 5, 2021

Conversation

breakds
Copy link
Contributor

@breakds breakds commented Dec 1, 2021

Motivation

Previously DDP (data distributed parallel) is only enabled for the on-policy training branch (i.e. train_from_unroll()). However, most of the algorithms actually runs on the off-policy training branch (i.e. train_from_replay_buffer()) which does not enjoy DDP yet.

Solution

After #1098 we are ready to make this work for non-composite off-policy branch training, (e.g. standard PPO). According to this, we need wrap the computation that involves all trainable parameters. Therefore, the change is mainly:

  1. Put calls to train_step() (i.e. the collection_train_info...()) and computation of loss into a standalone method of Algorithm, called _compute_train_info_and_loss_info.
  2. Wrap _compute_train_info_and_loss_info with @data_distributed.

Also, a special treatment is done to ignore replay buffer from participating DDP's synchronization.

Testing

Tested with

python -m alf.bin.train --conf alf/examples/ppo_procgen/bossfight_conf.py --root_dir ~/tmp/alf_sessions/ppo_procgen/ddp --distributed multi-gpu

(Note that mini_batch_size is halved)

Still under training but the throughput is almost doubled (13000 vs 7000) on a machine with two 3080s.

Also verified on ppo_cart_pole as a small experiment, see here.

@breakds breakds mentioned this pull request Dec 1, 2021
15 tasks
alf/utils/distributed.py Outdated Show resolved Hide resolved
Base automatically changed from PR/breakds/data_distributed_decorator to pytorch December 2, 2021 22:08
@breakds breakds force-pushed the PR/breakds/data_distrbuted_off_policy branch from eecfb67 to 58496fa Compare December 2, 2021 23:40
@breakds
Copy link
Contributor Author

breakds commented Dec 2, 2021

Addressed comments. PTAL.

alf/utils/distributed.py Outdated Show resolved Hide resolved
@breakds breakds force-pushed the PR/breakds/data_distrbuted_off_policy branch from ef9064d to e943e5c Compare December 3, 2021 23:40
@breakds breakds merged commit ee1e020 into pytorch Dec 5, 2021
@breakds breakds deleted the PR/breakds/data_distrbuted_off_policy branch December 5, 2021 06:15
pd-perry pushed a commit to pd-perry/alf that referenced this pull request Dec 11, 2021
* DDP for off-policy training branch

* More reliable way to ignore replay buffer's buffers

* Use set instead of list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants