Implementing encoding networks using Sequential #989

emailweixu · 2021-08-26T18:01:56Z

Since Sequential is very flexible, most networks can be implemented using it. And it automatically comes with support for make_parallel.

This PR only convert encoding networks to use containers. Future PRs will change other networks.

Since Sequential is very flexible, most networks can be implemented using it. And it automatically come with support for make_parallel.

emailweixu · 2021-08-26T18:04:19Z

alf/networks/encoding_networks.py

+        spec = input_tensor_spec
+        nets = []
+
+        if input_preprocessors:


There is a slight change for the handling of input_preprocessors. Previously, in PreprocessorNetwork, each input preprocessor will be copied first. The new version will not copy. @Haichao-Zhang do you remember why copy is needed?

There is a slight change for the handling of input_preprocessors. Previously, in PreprocessorNetwork, each input preprocessor will be copied first. The new version will not copy. @Haichao-Zhang do you remember why copy is needed?

@emailweixu Yes, the reason for this is that it will make sure when we create parallel networks with pre-processors (could have learnable parameters), the pre-processors will also be copied, unless explicitly specified not to copy.
This makes the expected behavior more natural, and fixed some previously encountered errors due to the mis-match between the thought behavior and actual behavior.

Some old related issues and PRs:
issue: #552
PR: #560

The make_parallel of the container will copy all the components. So there should be any problem with this change if the original motivation is to ensure make_parallel work correctly.

emailweixu · 2021-08-26T18:05:22Z

alf/algorithms/generator_test.py

@@ -142,7 +142,7 @@ def _train():
                inputs=None, loss_func=_neglogprob, batch_size=batch_size)
            generator.update_with_gradient(alg_step.info)

-        for i in range(2000):
+        for i in range(2100):


This is due to input preprocessor is not copied and causes different parameter initialization. See comment for encoding network.

Haichao-Zhang · 2021-08-26T20:24:08Z

alf/networks/actor_distribution_networks_test.py

@@ -112,6 +113,7 @@ def test_continuous_actor_distribution(self, lstm_hidden_size):
            conv_layer_params=self._conv_layer_params,
            continuous_projection_net_ctor=functools.partial(
                NormalProjectionNetwork, scale_distribution=True))
+        logging.info("---- %s" % str(actor_dist_net.state_spec))


can remove this line

can remove this line

This line is not removed yet.

Haichao-Zhang · 2021-08-26T20:39:37Z

alf/networks/encoding_networks.py

+                            last_kernel_initializer=None,
+                            last_use_fc_bn=False,
+                            name="ParallelEncodingNetwork"):
+    """Parallel feed-forward network with FC layers which allows the last layer


feed-forward network with FC layers -> encoding network

Haichao-Zhang · 2021-08-26T20:41:25Z

alf/networks/value_networks_test.py

@@ -66,6 +67,7 @@ def test_value_distribution(self, lstm_hidden_size):
                    conv_layer_params=conv_layer_params), None
            ],
            preprocessing_combiner=NestConcat())
+        logging.info("----%s" % str(value_net.state_spec))


can remove this line

alf/networks/encoding_networks.py

Haichao-Zhang · 2021-08-26T20:54:24Z

alf/networks/encoding_networks_test.py

@@ -187,7 +183,7 @@ def test_encoding_network_input_preprocessor(self):
        self.assertEqual(output.size()[1], 1)

    @parameterized.parameters((True, ), (False, ))
-    def test_encoding_network_nested_input(self, lstm):
+    def test_encoding_network_nested_input(self, lstm=False):


why do we need to set lstm=False here?

It was for debugging. Restored.

Haichao-Zhang · 2021-08-26T21:00:38Z

alf/networks/encoding_networks.py


+        TODO: remove ``allow_non_parallel_input``. This means to make parallel network


The option allow_non_parallel_input seems to be useful and we can keep it. Does it make sense to set default value of allow_non_parallel_input=True, so that the input will always be handled correctly by default?

A network can be a component of big network (container). It does not need the ability to handle non-parallel input in those situations.

Haichao-Zhang · 2021-08-26T23:56:24Z

alf/networks/actor_distribution_networks_test.py

@@ -112,6 +113,7 @@ def test_continuous_actor_distribution(self, lstm_hidden_size):
            conv_layer_params=self._conv_layer_params,
            continuous_projection_net_ctor=functools.partial(
                NormalProjectionNetwork, scale_distribution=True))
+        logging.info("---- %s" % str(actor_dist_net.state_spec))


can remove this line

This line is not removed yet.

Haichao-Zhang · 2021-08-26T23:59:38Z

alf/utils/checkpoint_utils_test.py

-            len(p_net_w_preprocessor.state_dict()),
-            len(p_net_wo_preprocessor.state_dict()) +
-            replicas * len(input_preprocessors.state_dict()))
+        if lstm:


Why do we need this if lstm condition here and also in L587?

This test relies on the fact the p_net_w_preprocessor is naively parallelized, which is no longer correct now.
Changed the test to not replying on this.

Haichao-Zhang

Look great! Just one final comment.

Haichao-Zhang · 2021-08-27T01:04:56Z

alf/utils/checkpoint_utils_test.py

-                    input_preprocessors.state_dict()))
+        # the number of parameters of a parallel network with a shared
+        # input_preprocessor should be equal to that of the parallel network
+        # with non-shared input processor - replicas * the number of parameters


In the comment, it should be (replicas-1) * ..., same as the code.

* Implementing encoding networks using Sequential Since Sequential is very flexible, most networks can be implemented using it. And it automatically come with support for make_parallel. * Fix checkpoint_utils_test * Address review comments * Address more comments * Fix comment

Implementing encoding networks using Sequential

9549f31

Since Sequential is very flexible, most networks can be implemented using it. And it automatically come with support for make_parallel.

emailweixu requested review from breakds and Haichao-Zhang August 26, 2021 18:01

emailweixu commented Aug 26, 2021

View reviewed changes

Fix checkpoint_utils_test

d3f682c

Haichao-Zhang reviewed Aug 26, 2021

View reviewed changes

Address review comments

a1be5e1

Haichao-Zhang reviewed Aug 27, 2021

View reviewed changes

Address more comments

1883f6c

Haichao-Zhang reviewed Aug 27, 2021

View reviewed changes

Fix comment

195c2a3

Haichao-Zhang approved these changes Aug 27, 2021

View reviewed changes

emailweixu merged commit 11eb659 into pytorch Aug 27, 2021

emailweixu linked an issue Aug 30, 2021 that may be closed by this pull request

Implementing various networks using containers #995

Open

hnyu deleted the PR_new_encoding_net branch September 6, 2021 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing encoding networks using Sequential #989

Implementing encoding networks using Sequential #989

emailweixu commented Aug 26, 2021

emailweixu Aug 26, 2021

Haichao-Zhang Aug 26, 2021

emailweixu Aug 26, 2021

emailweixu Aug 26, 2021

Haichao-Zhang Aug 26, 2021

Haichao-Zhang Aug 26, 2021

Haichao-Zhang Aug 26, 2021

emailweixu Aug 26, 2021

Haichao-Zhang Aug 26, 2021

emailweixu Aug 26, 2021

Haichao-Zhang Aug 26, 2021

emailweixu Aug 26, 2021

Haichao-Zhang Aug 26, 2021

emailweixu Aug 26, 2021

Haichao-Zhang Aug 26, 2021

Haichao-Zhang Aug 26, 2021

emailweixu Aug 27, 2021

Haichao-Zhang left a comment

Haichao-Zhang Aug 27, 2021


		TODO: remove ``allow_non_parallel_input``. This means to make parallel network

Implementing encoding networks using Sequential #989

Implementing encoding networks using Sequential #989

Conversation

emailweixu commented Aug 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Haichao-Zhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment