Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing encoding networks using Sequential #989

Merged
merged 5 commits into from
Aug 27, 2021

Conversation

emailweixu
Copy link
Contributor

Since Sequential is very flexible, most networks can be implemented using it. And it automatically comes with support for make_parallel.

This PR only convert encoding networks to use containers. Future PRs will change other networks.

Since Sequential is very flexible, most networks can be implemented using it. And it automatically come with support for make_parallel.
spec = input_tensor_spec
nets = []

if input_preprocessors:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight change for the handling of input_preprocessors. Previously, in PreprocessorNetwork, each input preprocessor will be copied first. The new version will not copy. @Haichao-Zhang do you remember why copy is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight change for the handling of input_preprocessors. Previously, in PreprocessorNetwork, each input preprocessor will be copied first. The new version will not copy. @Haichao-Zhang do you remember why copy is needed?

@emailweixu Yes, the reason for this is that it will make sure when we create parallel networks with pre-processors (could have learnable parameters), the pre-processors will also be copied, unless explicitly specified not to copy.
This makes the expected behavior more natural, and fixed some previously encountered errors due to the mis-match between the thought behavior and actual behavior.

Some old related issues and PRs:
issue: #552
PR: #560

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The make_parallel of the container will copy all the components. So there should be any problem with this change if the original motivation is to ensure make_parallel work correctly.

@@ -142,7 +142,7 @@ def _train():
inputs=None, loss_func=_neglogprob, batch_size=batch_size)
generator.update_with_gradient(alg_step.info)

for i in range(2000):
for i in range(2100):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to input preprocessor is not copied and causes different parameter initialization. See comment for encoding network.

@@ -112,6 +113,7 @@ def test_continuous_actor_distribution(self, lstm_hidden_size):
conv_layer_params=self._conv_layer_params,
continuous_projection_net_ctor=functools.partial(
NormalProjectionNetwork, scale_distribution=True))
logging.info("---- %s" % str(actor_dist_net.state_spec))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this line

This line is not removed yet.

last_kernel_initializer=None,
last_use_fc_bn=False,
name="ParallelEncodingNetwork"):
"""Parallel feed-forward network with FC layers which allows the last layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feed-forward network with FC layers -> encoding network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -66,6 +67,7 @@ def test_value_distribution(self, lstm_hidden_size):
conv_layer_params=conv_layer_params), None
],
preprocessing_combiner=NestConcat())
logging.info("----%s" % str(value_net.state_spec))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

alf/networks/encoding_networks.py Outdated Show resolved Hide resolved
@@ -187,7 +183,7 @@ def test_encoding_network_input_preprocessor(self):
self.assertEqual(output.size()[1], 1)

@parameterized.parameters((True, ), (False, ))
def test_encoding_network_nested_input(self, lstm):
def test_encoding_network_nested_input(self, lstm=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to set lstm=False here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for debugging. Restored.


TODO: remove ``allow_non_parallel_input``. This means to make parallel network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option allow_non_parallel_input seems to be useful and we can keep it. Does it make sense to set default value of allow_non_parallel_input=True, so that the input will always be handled correctly by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A network can be a component of big network (container). It does not need the ability to handle non-parallel input in those situations.

@@ -112,6 +113,7 @@ def test_continuous_actor_distribution(self, lstm_hidden_size):
conv_layer_params=self._conv_layer_params,
continuous_projection_net_ctor=functools.partial(
NormalProjectionNetwork, scale_distribution=True))
logging.info("---- %s" % str(actor_dist_net.state_spec))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this line

This line is not removed yet.

len(p_net_w_preprocessor.state_dict()),
len(p_net_wo_preprocessor.state_dict()) +
replicas * len(input_preprocessors.state_dict()))
if lstm:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this if lstm condition here and also in L587?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on the fact the p_net_w_preprocessor is naively parallelized, which is no longer correct now.
Changed the test to not replying on this.

Copy link
Contributor

@Haichao-Zhang Haichao-Zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great! Just one final comment.

input_preprocessors.state_dict()))
# the number of parameters of a parallel network with a shared
# input_preprocessor should be equal to that of the parallel network
# with non-shared input processor - replicas * the number of parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment, it should be (replicas-1) * ..., same as the code.

@emailweixu emailweixu merged commit 11eb659 into pytorch Aug 27, 2021
@emailweixu emailweixu linked an issue Aug 30, 2021 that may be closed by this pull request
@hnyu hnyu deleted the PR_new_encoding_net branch September 6, 2021 15:32
pd-perry pushed a commit to pd-perry/alf that referenced this pull request Dec 11, 2021
* Implementing encoding networks using Sequential

Since Sequential is very flexible, most networks can be implemented using it. And it automatically come with support for make_parallel.

* Fix checkpoint_utils_test

* Address review comments

* Address more comments

* Fix comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementing various networks using containers
2 participants