Skip to content
This repository was archived by the owner on Oct 19, 2024. It is now read-only.

Commit f76fe8d

Browse files
zhanyuanucbzhanyuan.zhang
andauthored
Added explanation on the result (#887)
Co-authored-by: zhanyuan.zhang <[email protected]>
1 parent 97d4524 commit f76fe8d

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

docs/gallery/tutorials/pipeshard_parallelism.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,3 +244,20 @@ def loss_func(params):
244244
atol=5e-3)
245245

246246
alpa.shutdown()
247+
248+
################################################################################
249+
# Interpret the Results
250+
# ---------------------
251+
# **Some basic concepts**
252+
# - Cluster mesh and submeshes
253+
# - Cluster mesh is a computer cluster that contains GPUs. A ``N×M`` cluster mesh means the cluster has ``N`` physical machines and each machine has ``M`` GPUs.
254+
# - Submeshes can be obtained by slicing from the cluster mesh. For example, given a ``N×M`` cluster mesh, a submesh ``(1, M)`` means using all GPUs in one physical machine.
255+
# - For more details on how Alpa uses submeshes to solve *inter-operator parallelism*, you can read the **Section 5: Inter-Operator Parallelism** in the `Alpa paper <https://arxiv.org/pdf/2201.12023.pdf>`_.
256+
# - Device mesh and logical mesh
257+
# - A device mesh is a 2-dimensional logical view of a set of physical devices.
258+
# - For a set of physical devices, there can be multiple logical views. For example, given 2 nodes and 8 GPUs per node (i.e., 16 devices in total), we can view them as a 2×8, 1×16, 4×4, 8×2, or 16×1 device mesh.
259+
# - The mapping between physical devices and the logical device mesh view is optimized by the inter-op pass
260+
# - Hence, you can see ``Result mesh_shapes`` and the corresponding ``Result logical_mesh_shapes`` in the optimization output.
261+
#
262+
# With the basic concepts in mind, you now can better understand the ``ModuleProfileResult``:
263+
# - ``ModuleProfileResult``: ``result[(i, j, s, c), m]`` means this stage contains forward layers ``i, i+1, ..., j`` and corresponding backward layers, and runs under the ``s``-th submesh and the ``c``-th auto sharding config for the submesh. The ``m = 0`` means the result is for the forward pass, and ``m = 1`` for backward pass.

0 commit comments

Comments
 (0)