Skip to content

Commit

Permalink
advanced/rpc_ddp_tutorial ๋ฒˆ์—ญ (#382)
Browse files Browse the repository at this point in the history
* advanced/rpc_ddp_tutorial-translation

Translate rpc_ddp_tutorial.rst and main.py

* Update rpc_ddp_tutorial.rst

typo correction from line133

* Update rpc_ddp_tutorial.rst

Modified the translation based on reviews.

* Update main.py

Incomplete _considering about how to solve the 'optimizer'

* Update rpc_ddp_tutorial.rst

Incomplete

* Update main.py

Fix translation errors.

* Update rpc_ddp_tutorial.rst

Fix translation errors

* Update main.py

Fix a translation error

* Update rpc_ddp_tutorial.rst

Fix a translation error
  • Loading branch information
dajeongPark-dev authored Jan 10, 2022
1 parent 0c32b3d commit efe11eb
Show file tree
Hide file tree
Showing 2 changed files with 144 additions and 161 deletions.
232 changes: 108 additions & 124 deletions advanced_source/rpc_ddp_tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,160 +1,144 @@
Combining Distributed DataParallel with Distributed RPC Framework
๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DDP)๊ณผ ๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ ๊ฒฐํ•ฉ
=================================================================
**Authors**: `Pritam Damania <https://github.com/pritamdamania87>`_ and `Yi Wang <https://github.com/SciPioneer>`_


This tutorial uses a simple example to demonstrate how you can combine
`DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ (DDP)
with the `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
to combine distributed data parallelism with distributed model parallelism to
train a simple model. Source code of the example can be found `here <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__.

Previous tutorials,
`Getting Started With Distributed Data Parallel <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__
and `Getting Started with Distributed RPC Framework <https://tutorials.pytorch.kr/intermediate/rpc_tutorial.html>`__,
described how to perform distributed data parallel and distributed model
parallel training respectively. Although, there are several training paradigms
where you might want to combine these two techniques. For example:

1) If we have a model with a sparse part (large embedding table) and a dense
part (FC layers), we might want to put the embedding table on a parameter
server and replicate the FC layer across multiple trainers using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
The `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
can be used to perform embedding lookups on the parameter server.
2) Enable hybrid parallelism as described in the `PipeDream <https://arxiv.org/abs/1806.03377>`__ paper.
We can use the `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
to pipeline stages of the model across multiple workers and replicate each
stage (if needed) using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
**์ €์ž**: `Pritam Damania <https://github.com/pritamdamania87>`__ and `Yi Wang <https://github.com/SciPioneer>`__

**๋ฒˆ์—ญ**: `๋ฐ•๋‹ค์ • <https://github.com/dajeongPark-dev>`_


์ด ํŠœํ† ๋ฆฌ์–ผ์€ ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(distributed data parallelism)์™€
๋ถ„์‚ฐ ๋ชจ๋ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(distributed model parallelism)๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ ํ•™์Šต์‹œํ‚ฌ ๋•Œ
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DistributedDataParallel) <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ (DDP)๊ณผ
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ(Distributed RPC framework) <https://pytorch.org/docs/master/rpc.html>`__๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์ „ ํŠœํ† ๋ฆฌ์–ผ ๋‚ด์šฉ์ด์—ˆ๋˜
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__์™€
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/rpc_tutorial.html>`__๋Š”
๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ๋ฐ ๋ถ„์‚ฐ ๋ชจ๋ธ ๋ณ‘๋ ฌ ํ•™์Šต์„ ๊ฐ๊ฐ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ์ด ๋‘ ๊ฐ€์ง€ ๊ธฐ์ˆ ์„ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด:

1) ํฌ์†Œ ๋ถ€๋ถ„(ํฐ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”)๊ณผ ๋ฐ€์ง‘ ๋ถ€๋ถ„(FC ๋ ˆ์ด์–ด)์ด ์žˆ๋Š” ๋ชจ๋ธ์ด ์žˆ๋Š” ๊ฒฝ์šฐ,
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„(parameter server)์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(embedding table)์„ ๋†“๊ณ  `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
์—ฌ๋Ÿฌ ํŠธ๋ ˆ์ด๋„ˆ์— ๊ฑธ์ณ FC ๋ ˆ์ด์–ด๋ฅผ ๋ณต์ œํ•˜๋Š” ๊ฒƒ์„ ์›ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋•Œ `๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ <https://pytorch.org/docs/master/rpc.html>`__๋Š”
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2) ๋‹ค์Œ์€ `PipeDream <https://arxiv.org/abs/1806.03377>`__ ๋ฌธ์„œ์—์„œ ์„ค๋ช…๋œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํ™œ์„ฑํ™”ํ•˜๊ธฐ ์ž…๋‹ˆ๋‹ค.
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ <https://pytorch.org/docs/master/rpc.html>`__๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ
์—ฌ๋Ÿฌ worker์— ๊ฑธ์ณ ๋ชจ๋ธ์˜ ๋‹จ๊ณ„๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ(pipeline)ํ•  ์ˆ˜ ์žˆ๊ณ 
(ํ•„์š”์— ๋”ฐ๋ผ) `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„ ์ด์šฉํ•ด์„œ
๊ฐ ๋‹จ๊ณ„๋ฅผ ๋ณต์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

|
In this tutorial we will cover case 1 mentioned above. We have a total of 4
workers in our setup as follows:
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ์ฒซ ๋ฒˆ์งธ ๊ฒฝ์šฐ๋ฅผ ๋‹ค๋ฃฐ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ด 4๊ฐœ์˜ worker๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:


1) 1 Master, which is responsible for creating an embedding table
(nn.EmbeddingBag) on the parameter server. The master also drives the
training loop on the two trainers.
2) 1 Parameter Server, which basically holds the embedding table in memory and
responds to RPCs from the Master and Trainers.
3) 2 Trainers, which store an FC layer (nn.Linear) which is replicated amongst
themselves using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
The trainers are also responsible for executing the forward pass, backward
pass and optimizer step.
1) 1๊ฐœ์˜ ๋งˆ์Šคํ„ฐ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(nn.EmbeddingBag) ์ƒ์„ฑ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ ๋งˆ์Šคํ„ฐ๋Š” ๋‘ ํŠธ๋ ˆ์ด๋„ˆ์˜ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
2) 1๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์„ ๋ณด์œ ํ•˜๊ณ  ๋งˆ์Šคํ„ฐ ๋ฐ ํŠธ๋ ˆ์ด๋„ˆ์˜ RPC์— ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.
3) 2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ๋Š” `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„
์‚ฌ์šฉํ•˜์—ฌ ์ž์ฒด์ ์œผ๋กœ ๋ณต์ œ๋˜๋Š” FC ๋ ˆ์ด์–ด(nn.Linear)๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋˜ํ•œ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ(forward pass), ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ(backward pass) ๋ฐ ์ตœ์ ํ™” ๋‹จ๊ณ„๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

|
The entire training process is executed as follows:

1) The master creates a `RemoteModule <https://pytorch.org/docs/master/rpc.html#remotemodule>`__
that holds an embedding table on the Parameter Server.
2) The master, then kicks off the training loop on the trainers and passes the
remote module to the trainers.
3) The trainers create a ``HybridModel`` which first performs an embedding lookup
using the remote module provided by the master and then executes the
FC layer which is wrapped inside DDP.
4) The trainer executes the forward pass of the model and uses the loss to
execute the backward pass using `Distributed Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__.
5) As part of the backward pass, the gradients for the FC layer are computed
first and synced to all trainers via allreduce in DDP.
6) Next, Distributed Autograd propagates the gradients to the parameter server,
where the gradients for the embedding table are updated.
7) Finally, the `Distributed Optimizer <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__ is used to update all the parameters.


.. attention::

You should always use `Distributed Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__
for the backward pass if you're combining DDP and RPC.


Now, let's go through each part in detail. Firstly, we need to setup all of our
workers before we can perform any training. We create 4 processes such that
ranks 0 and 1 are our trainers, rank 2 is the master and rank 3 is the
parameter server.

We initialize the RPC framework on all 4 workers using the TCP init_method.
Once RPC initialization is done, the master creates a remote module that holds an `EmbeddingBag <https://pytorch.org/docs/master/generated/torch.nn.EmbeddingBag.html>`__
layer on the Parameter Server using `RemoteModule <https://pytorch.org/docs/master/rpc.html#torch.distributed.nn.api.remote_module.RemoteModule>`__.
The master then loops through each trainer and kicks off the training loop by
calling ``_run_trainer`` on each trainer using `rpc_async <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.rpc_async>`__.
Finally, the master waits for all training to finish before exiting.

The trainers first initialize a ``ProcessGroup`` for DDP with world_size=2
(for two trainers) using `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__.
Next, they initialize the RPC framework using the TCP init_method. Note that
the ports are different in RPC initialization and ProcessGroup initialization.
This is to avoid port conflicts between initialization of both frameworks.
Once the initialization is done, the trainers just wait for the ``_run_trainer``
RPC from the master.

The parameter server just initializes the RPC framework and waits for RPCs from
the trainers and master.
์ „์ฒด์ ์ธ ํ•™์Šต๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค:

1) ๋งˆ์Šคํ„ฐ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์„ ๋‹ด๊ณ  ์žˆ๋Š”
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
2) ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋งˆ์Šคํ„ฐ๋Š” ํŠธ๋ ˆ์ด๋„ˆ์˜ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์‹œ์ž‘ํ•˜๊ณ  ์›๊ฒฉ ๋ชจ๋“ˆ(remote module)์„ ํŠธ๋ ˆ์ด๋„ˆ์—๊ฒŒ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
3) ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋จผ์ € ๋งˆ์Šคํ„ฐ์—์„œ ์ œ๊ณตํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ
์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•œ ๋‹ค์Œ DDP ๋‚ด๋ถ€์— ๊ฐ์‹ธ์ง„ FC ๋ ˆ์ด์–ด๋ฅผ ์‹คํ–‰ํ•˜๋Š” ``HybridModel``์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
4) ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•˜๊ณ  ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ `๋ถ„์‚ฐ Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__๋ฅผ
์‚ฌ์šฉํ•˜์—ฌ ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
5) ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์˜ ์ผ๋ถ€๋กœ FC ๋ ˆ์ด์–ด์˜ ๋ณ€ํ™”๋„๊ฐ€ ๋จผ์ € ๊ณ„์‚ฐ๋˜๊ณ  DDP์˜ allreduce๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ํŠธ๋ ˆ์ด๋„ˆ์™€ ๋™๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
6) ๋‹ค์Œ์œผ๋กœ, ๋ถ„์‚ฐ Autograd๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„๋กœ ๋ณ€ํ™”๋„๋ฅผ ์ „ํŒŒํ•˜๊ณ  ๊ทธ๊ณณ์—์„œ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋ณ€ํ™”๋„๊ฐ€ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค.
7) ๋งˆ์ง€๋ง‰์œผ๋กœ, `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
.. ์ฃผ์˜์‚ฌํ•ญ::
DDP์™€ RPC๋ฅผ ๊ฒฐํ•ฉํ•  ๋•Œ, ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์— ๋Œ€ํ•ด ํ•ญ์ƒ
`๋ถ„์‚ฐ Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ด์ œ ๊ฐ ๋ถ€๋ถ„์„ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
๋จผ์ € ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์ „์— ๋ชจ๋“  ์ž‘์—…์ž๋ฅผ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ˆœ์œ„ 0๊ณผ 1์€ ํŠธ๋ ˆ์ด๋„ˆ, ์ˆœ์œ„ 2๋Š” ๋งˆ์Šคํ„ฐ, ์ˆœ์œ„ 3์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์ธ 4๊ฐœ์˜ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
TCP init_method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 4๊ฐœ์˜ ๋ชจ๋“  worker์—์„œ RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
RPC ์ดˆ๊ธฐํ™”๊ฐ€ ๋๋‚˜๋ฉด, ๋งˆ์Šคํ„ฐ๋Š” `EmbeddingBag <https://pytorch.org/docs/master/generated/torch.nn.EmbeddingBag.html>`__ ๋ ˆ์ด์–ด๋ฅผ
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ๋‹ด๊ณ  ์žˆ๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ ํ•˜๋‚˜๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋Ÿฐ ๋‹ค์Œ ๋งˆ์Šคํ„ฐ๋Š” ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ๋ฅผ ๋ฐ˜๋ณตํ•˜๊ณ  `rpc_async <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.rpc_async>`__๋ฅผ
์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ``_run_trainer``๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋ฐ˜๋ณต ํ•™์Šต์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
๋งˆ์ง€๋ง‰์œผ๋กœ ๋งˆ์Šคํ„ฐ๋Š” ์ข…๋ฃŒํ•˜๊ธฐ ์ „์— ๋ชจ๋“  ํ•™์Šต์ด ์™„๋ฃŒ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.
ํŠธ๋ ˆ์ด๋„ˆ๋Š” `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
(2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ) world_size=2๋กœ DDP๋ฅผ ์œ„ํ•ด ``ProcessGroup``์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
๋‹ค์Œ์œผ๋กœ TCP init_method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
์—ฌ๊ธฐ์„œ ์ฃผ์˜ ํ•  ์ ์€ RPC ์ดˆ๊ธฐํ™”์™€ ProgressGroup ์ดˆ๊ธฐํ™”์—์„œ ์“ฐ์ด๋Š” ํฌํŠธ(port)๊ฐ€ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด๋Š” ๋‘ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์ดˆ๊ธฐํ™” ๊ฐ„์— ํฌํŠธ ์ถฉ๋Œ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž…๋‹ˆ๋‹ค.
์ดˆ๊ธฐํ™”๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋งˆ์Šคํ„ฐ์˜ ``_run_trainer` RPC๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
ํŒŒ๋ผํ”ผํ„ฐ ์„œ๋ฒ„๋Š” RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ํŠธ๋ ˆ์ด๋„ˆ์™€ ๋งˆ์Šคํ„ฐ์˜ RPC๋ฅผ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN run_worker
:end-before: END run_worker
Before we discuss details of the Trainer, let's introduce the ``HybridModel`` that
the trainer uses. As described below, the ``HybridModel`` is initialized using a
remote module that holds an embedding table (``remote_emb_module``) on the parameter server and the ``device``
to use for DDP. The initialization of the model wraps an
`nn.Linear <https://pytorch.org/docs/master/generated/torch.nn.Linear.html>`__
layer inside DDP to replicate and synchronize this layer across all trainers.
ํŠธ๋ ˆ์ด๋„ˆ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์— ์•ž์„œ, ํŠธ๋ ˆ์ด๋„ˆ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ``HybridModel``์— ๋Œ€ํ•ด ์„ค๋ช…๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
์•„๋ž˜์— ์„ค๋ช…๋œ ๋Œ€๋กœ ``HybridModel``์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์˜ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(``remote_emb_module``)๊ณผ DDP์— ์‚ฌ์šฉํ•  ``device``๋ฅผ ๋ณด์œ ํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
๋ชจ๋ธ ์ดˆ๊ธฐํ™”๋Š” DDP ๋‚ด๋ถ€์˜ `nn.Linear <https://pytorch.org/docs/master/generated/torch.nn.Linear.html>`__ ๋ ˆ์ด์–ด๋ฅผ
๊ฐ์‹ธ ๋ชจ๋“  ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ์ด ๋ ˆ์ด์–ด๋ฅผ ๋ณต์ œํ•˜๊ณ  ๋™๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
The forward method of the model is pretty straightforward. It performs an
embedding lookup on the parameter server using RemoteModule's ``forward``
and passes its output onto the FC layer.
๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ(forward) ํ•จ์ˆ˜๋Š” ๊ฝค ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค.
RemoteModule์˜ ``forward``๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ทธ ์ถœ๋ ฅ์„ FC ๋ ˆ์ด์–ด์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN hybrid_model
:end-before: END hybrid_model
Next, let's look at the setup on the Trainer. The trainer first creates the
``HybridModel`` described above using a remote module that holds the embedding table on the
parameter server and its own rank.

Now, we need to retrieve a list of RRefs to all the parameters that we would
like to optimize with `DistributedOptimizer <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__.
To retrieve the parameters for the embedding table from the parameter server,
we can call RemoteModule's `remote_parameters <https://pytorch.org/docs/master/rpc.html#torch.distributed.nn.api.remote_module.RemoteModule.remote_parameters>`__,
which basically walks through all the parameters for the embedding table and returns
a list of RRefs. The trainer calls this method on the parameter server via RPC
to receive a list of RRefs to the desired parameters. Since the
DistributedOptimizer always takes a list of RRefs to parameters that need to
be optimized, we need to create RRefs even for the local parameters for our
FC layers. This is done by walking ``model.fc.parameters()``, creating an RRef for
each parameter and appending it to the list returned from ``remote_parameters()``.
Note that we cannnot use ``model.parameters()``,
because it will recursively call ``model.remote_emb_module.parameters()``,
which is not supported by ``RemoteModule``.

Finally, we create our DistributedOptimizer using all the RRefs and define a
CrossEntropyLoss function.
๋‹ค์Œ์œผ๋กœ ํŠธ๋ ˆ์ด๋„ˆ์˜ ์„ค์ •์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋จผ์ € ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์˜ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”๊ณผ ์ž์ฒด ์ˆœ์œ„๋ฅผ ๋ณด์œ ํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ
์œ„์—์„œ ์„ค๋ช…ํ•œ ``HybridModel``์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
์ด์ œ `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__๋กœ
์ตœ์ ํ™”ํ•˜๋ ค๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์œ„ํ•ด
RemoteModule์˜ `remote_parameters <https://pytorch.org/docs/master/rpc.html#torch.distributed.nn.api.remote_module.RemoteModule.remote_parameters>`__๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ดํŽด๋ณด๊ณ  RRef ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
ํŠธ๋ ˆ์ด๋„ˆ๋Š” RPC๋ฅผ ํ†ตํ•ด ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ด ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์›ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค.
DistributedOptimizer๋Š” ํ•ญ์ƒ ์ตœ์ ํ™”ํ•ด์•ผ ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค๊ธฐ ๋•Œ๋ฌธ์— FC ๋ ˆ์ด์–ด์˜ ์ „์—ญ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ๋„ RRef๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ด๊ฒƒ์€ ``model.fc.parameters()``๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef๋ฅผ ์ƒ์„ฑํ•˜๊ณ 
``remote_parameters()``์—์„œ ๋ฐ˜ํ™˜๋œ ๋ชฉ๋ก์— ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
์ฐธ๊ณ ๋กœ ``model.parameters()``๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ``RemoteModule``์—์„œ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ``model.remote_emb_module.parameters()``๋ฅผ ์žฌ๊ท€์ ์œผ๋กœ ํ˜ธ์ถœํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๋งˆ์ง€๋ง‰์œผ๋กœ ๋ชจ๋“  RRef๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DistributedOptimizer๋ฅผ ๋งŒ๋“ค๊ณ  CrossEntropyLoss ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN setup_trainer
:end-before: END setup_trainer
Now we're ready to introduce the main training loop that is run on each trainer.
``get_next_batch`` is just a helper function to generate random inputs and
targets for training. We run the training loop for multiple epochs and for each
batch:
์ด์ œ ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ์‹คํ–‰๋˜๋Š” ๊ธฐ๋ณธ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์†Œ๊ฐœํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
``get_next_batch``๋Š” ํ•™์Šต์„ ์œ„ํ•œ ์ž„์˜์˜ ์ž…๋ ฅ๊ณผ ๋Œ€์ƒ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋„์™€์ฃผ๋Š” ํ•จ์ˆ˜์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.
์—ฌ๋Ÿฌ ์—ํญ(epoch)๊ณผ ๊ฐ ๋ฐฐ์น˜(batch)์— ๋Œ€ํ•ด ํ•™์Šต ๋ฃจํ”„๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:
1) Setup a `Distributed Autograd Context <https://pytorch.org/docs/master/rpc.html#torch.distributed.autograd.context>`__
for Distributed Autograd.
2) Run the forward pass of the model and retrieve its output.
3) Compute the loss based on our outputs and targets using the loss function.
4) Use Distributed Autograd to execute a distributed backward pass using the loss.
5) Finally, run a Distributed Optimizer step to optimize all the parameters.
1) ๋จผ์ € ๋ถ„์‚ฐ Autograd์— ๋Œ€ํ•ด
`๋ถ„์‚ฐ Autograd Context <https://pytorch.org/docs/master/rpc.html#torch.distributed.autograd.context>`__๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
2) ๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•˜๊ณ  ํ•ด๋‹น ์ถœ๋ ฅ์„ ๊ฒ€์ƒ‰(retrieve)ํ•ฉ๋‹ˆ๋‹ค.
3) ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถœ๋ ฅ๊ณผ ๋ชฉํ‘œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์†์‹ค์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
4) ๋ถ„์‚ฐ Autograd๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„์‚ฐ ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
5) ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ๋‹จ๊ณ„๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN run_trainer
:end-before: END run_trainer
.. code:: python
Source code for the entire example can be found `here <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__.
์ „์ฒด ์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Loading

0 comments on commit efe11eb

Please sign in to comment.