Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_parallel for network containers #985

Merged
merged 3 commits into from
Aug 26, 2021
Merged

Conversation

emailweixu
Copy link
Contributor

This change supports creating parallel network for network containers. This is achieved by implementing make_parallel for various layers and transforming all the modules in the container.

With this change, it will be trivial to implement make_parallel for all kinds of networks if they are implemented using network containers.

Copy link
Contributor

@le-horizon le-horizon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice improvement Wei.

Looks good as is, just small comments.

Thanks,
Le

alf/layers.py Outdated Show resolved Hide resolved
@@ -171,7 +178,8 @@ def __init__(self,
stack_size,
pooling_size=1,
dtype=torch.float32,
mode='skip'):
mode='skip',
name='TemporalPool'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the TemporalPool example above, it seems if pool size == 2, we always ignore every other (half) of the timesteps.
Have we considered pooling every timestep, instead of pooling every pool_size timesteps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mode "avg" what you want?

alf/layers.py Outdated Show resolved Hide resolved
alf/layers.py Outdated Show resolved Hide resolved
alf/layers_test.py Outdated Show resolved Hide resolved
alf/layers.py Show resolved Hide resolved
alf/layers.py Show resolved Hide resolved
alf/layers.py Show resolved Hide resolved

A parallel network has ``n`` copies of network with the same structure but
different independently initialized parameters. The parallel network can
process a batch of the data with shape [batch_size, n, ...] using ``n``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we automatically convert data with shape [batch_size, ...] to [batch_size, n, ...]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As network can have sub-networks, doing this check for all of them can be wasteful. So it's intended to use make_parallel_input to do the conversion by the user.

@@ -104,6 +105,10 @@ def _combine_flat(self, tensors):
else:
return torch.cat(tensors, dim=self._dim)

def make_parallel(self, n):
dim = self._dim if self._dim < 0 else self._dim + 1
Copy link
Collaborator

@hnyu hnyu Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add comments saying that self._dim excludes batch dim and parallel dim. The current implementation seems not to ignore the batch dim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the behavior of NestConcat to not including batch dim.
Comments added.

Copy link
Contributor

@Haichao-Zhang Haichao-Zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are very helpful. Just some more minor points.

alf/layers.py Outdated Show resolved Hide resolved
alf/utils/spec_utils.py Outdated Show resolved Hide resolved
@emailweixu emailweixu merged commit b8cb37a into pytorch Aug 26, 2021
@hnyu hnyu deleted the PR_parallel_container branch August 26, 2021 17:17
@emailweixu emailweixu linked an issue Aug 30, 2021 that may be closed by this pull request
pd-perry pushed a commit to pd-perry/alf that referenced this pull request Dec 11, 2021
* make_parallel for network containers

* Address comments

* Address further comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement make_parallel for network containers
4 participants