Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement @data_distributed Decorator #1098

Merged
merged 3 commits into from
Dec 2, 2021

Conversation

breakds
Copy link
Contributor

@breakds breakds commented Dec 1, 2021

Motivation

Part of the effort to #1096 .

Originally we only "DDP"-fied the unroll of RLAlgorithm, but later we found that such DDP wrapping can be made more general. To be more specific, we want to have a generally applicable, simple and transparent utility for this. It is almost always the case that we want to wrap a method of a torch Module. Therefore, a decorator for module methods is desirable.

Solution

Implemented the @data_distributed decorator that can be applied on Module (such as RLAlogirhtm) methods. It will convert the method to be DDP-capable. A DDP-capable method will run with DDP wrapper automatically if the module it belongs to has DDP activated. A module is DDP-acitvated if its _ddp_activated_rank is set to a non-negative value. For Algorithm and its derived classes, this can be done by calling activate_ddp(rank).

Also removed _UnrollPerformer, and use @data_distributed to wrap unroll instead. The code should be much cleaner.

NOTE: this still only enables DDP for the on-policy training. Will follow up on off-policy training too, which is also based on this PR.

Testing

Run ac_breakout with both DDP turned on and off.
ddp_ac_breakout

Note that with and without DDP, the training curve almost coincides.

@breakds breakds requested review from emailweixu and hnyu and removed request for emailweixu December 1, 2021 03:06
@breakds breakds added the enhancement New feature or request label Dec 1, 2021
@breakds breakds mentioned this pull request Dec 1, 2021
15 tasks
alf/utils/distributed.py Outdated Show resolved Hide resolved
alf/utils/distributed.py Show resolved Hide resolved
@breakds
Copy link
Contributor Author

breakds commented Dec 1, 2021

PR updated based on the comments.

Initial implemented of data_distributed
@breakds breakds force-pushed the PR/breakds/data_distributed_decorator branch from 06b1c97 to fc1c5c6 Compare December 1, 2021 22:57
@breakds breakds force-pushed the PR/breakds/data_distributed_decorator branch from fc1c5c6 to 3f71b8d Compare December 1, 2021 22:58
alf/algorithms/algorithm.py Outdated Show resolved Hide resolved
alf/utils/distributed.py Outdated Show resolved Hide resolved
alf/utils/distributed.py Outdated Show resolved Hide resolved


def data_distributed(method):
"""This decorator makes a target method of a module capable of being data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this decorator can only apply to methods of Algorithm? If so, the docstring should clarify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just for Algorithm, but actually all nn.Module derived classes. One of the prerequisite is that it needs to have self._ddp_activated_rank, which is stated in the docstring as a contract.

@breakds breakds merged commit de2c6a1 into pytorch Dec 2, 2021
@breakds breakds deleted the PR/breakds/data_distributed_decorator branch December 2, 2021 22:08
pd-perry pushed a commit to pd-perry/alf that referenced this pull request Dec 11, 2021
* Implement @data_distributed

Initial implemented of data_distributed

* Update the documentation according to the comments.

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
ALF improvement
  
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

3 participants