Skip to content

Commit

Permalink
org: moving dyad to be separate module, and organizing into supplemen…
Browse files Browse the repository at this point in the history
…tary section

Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch committed Jul 5, 2024
1 parent 770e1a4 commit 5a94458
Show file tree
Hide file tree
Showing 12 changed files with 21 additions and 32 deletions.
5 changes: 2 additions & 3 deletions 2024-RADIUSS-AWS/JupyterNotebook/docker/jupyter-launcher.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,13 @@
- title: Dyad Notebook Tutorial
description: This is a tutorial for using Dyad
type: jupyterlab-commands
icon: flux-icon.png
source:
- label: Dyad Tutorial
id: 'filebrowser:open-path'
args:
path: dyad_dlio.ipynb
path: supplementary/dyad/dyad_dlio.ipynb
icon: ./flux-icon.png
catalog: Notebook
catalog: Console

- title: Flux Framework Portal
description: Flux Framework portal for projects, releases, and publication.
Expand Down
36 changes: 13 additions & 23 deletions 2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"\n",
"And if you have some extra time and interest, we have supplementary chapters to teach you about advanced (often experimental, or under development) features:\n",
"\n",
"* [Supplementary Chapter 1: Using DYAD to accelerate distributed Deep Learning (DL) training](./dyad_dlio.ipynb)\n",
"* [Supplementary Chapter 1: Using DYAD to accelerate distributed Deep Learning (DL) training](./supplementary/dyad/dyad_dlio.ipynb)\n",
"\n",
"Let's get started! To provide some brief, added background on Flux and a bit more motivation for our tutorial, \"Shift+Enter\" the cell below to watch our YouTube video!"
]
Expand Down Expand Up @@ -173,7 +173,9 @@
"tags": []
},
"source": [
"Did you know you can also get help for a specific command? For example, let's run, e.g. `flux help jobs` to get information on a sub-command:"
"<div class=\"alert alert-block alert-info\">\n",
"<span style=\"font-weight:600\">Tip:</span> Did you know you can also get help for a specific command? For example, run, `flux help jobs` to get information on a sub-command.\n",
"</div>"
]
},
{
Expand Down Expand Up @@ -1170,7 +1172,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 1,
"id": "2735b1ca-e761-46be-b509-a86b771628fc",
"metadata": {},
"outputs": [
Expand All @@ -1180,26 +1182,14 @@
"text": [
"flux-job: task(s) exited with exit code 1\n",
"flux-job: task(s) exited with exit code 1\n",
"7db0bdd6f967\n",
"usage: flux-ion-resource.py [-h] [-v]\n",
" {match,update,info,stats,stats-cancel,cancel,find,status,set-status,set-property,get-property,ns-info,params}\n",
" ...\n",
"flux-ion-resource.py: error: argument {match,update,info,stats,stats-cancel,cancel,find,status,set-status,set-property,get-property,ns-info,params}: invalid choice: 'stat' (choose from 'match', 'update', 'info', 'stats', 'stats-cancel', 'cancel', 'find', 'status', 'set-status', 'set-property', 'get-property', 'ns-info', 'params')\n",
"awk: line 1: syntax error at or near *\n",
"awk: line 1: syntax error at or near :\n",
"e8cfed35f636\n",
"flux-tree-helper: ERROR: Expecting value: line 1 column 160 (char 159)\n",
"Jul 05 04:27:34.087814 UTC broker.err[0]: rc2.0: flux tree -N1 -c1 --leaf --prefix=tree.1.1 --njobs=1 -- hostname Exited (rc=1) 0.5s\n",
"usage: flux-ion-resource.py [-h] [-v]\n",
" {match,update,info,stats,stats-cancel,cancel,find,status,set-status,set-property,get-property,ns-info,params}\n",
" ...\n",
"flux-ion-resource.py: error: argument {match,update,info,stats,stats-cancel,cancel,find,status,set-status,set-property,get-property,ns-info,params}: invalid choice: 'stat' (choose from 'match', 'update', 'info', 'stats', 'stats-cancel', 'cancel', 'find', 'status', 'set-status', 'set-property', 'get-property', 'ns-info', 'params')\n",
"awk: line 1: syntax error at or near *\n",
"flux-tree-helper: ERROR: Expecting value: line 1 column 157 (char 156)\n",
"Jul 05 04:27:35.284158 UTC broker.err[0]: rc2.0: flux tree -N1 -c2 --topology=2 --queue-policy=fcfs --prefix=tree.1 --njobs=2 -- hostname Exited (rc=1) 2.3s\n",
"usage: flux-ion-resource.py [-h] [-v]\n",
" {match,update,info,stats,stats-cancel,cancel,find,status,set-status,set-property,get-property,ns-info,params}\n",
" ...\n",
"flux-ion-resource.py: error: argument {match,update,info,stats,stats-cancel,cancel,find,status,set-status,set-property,get-property,ns-info,params}: invalid choice: 'stat' (choose from 'match', 'update', 'info', 'stats', 'stats-cancel', 'cancel', 'find', 'status', 'set-status', 'set-property', 'get-property', 'ns-info', 'params')\n",
"awk: line 1: syntax error at or near *\n",
"Jul 05 05:20:32.333883 UTC broker.err[0]: rc2.0: flux tree -N1 -c1 --leaf --prefix=tree.1.1 --njobs=1 -- hostname Exited (rc=1) 0.6s\n",
"awk: line 1: syntax error at or near :\n",
"flux-tree-helper: ERROR: Expecting value: line 1 column 156 (char 155)\n",
"Jul 05 05:20:33.523886 UTC broker.err[0]: rc2.0: flux tree -N1 -c2 --topology=2 --queue-policy=fcfs --prefix=tree.1 --njobs=2 -- hostname Exited (rc=1) 2.4s\n",
"awk: line 1: syntax error at or near :\n",
"flux-tree-helper: ERROR: Expecting value: line 1 column 155 (char 154)\n",
"cat: ./tree.out: No such file or directory\n"
]
Expand Down Expand Up @@ -1718,7 +1708,7 @@
"We recommend not running `flux top` in the notebook as it is not designed to display output from a command that runs continuously.\n",
"\n",
"## Flux pstree \n",
"In analogy to `top`, Flux provides `flux pstree`. Try it out in the JupyterLab terminal or here in the notebook.\n",
"In analogy to `top`, Flux provides `flux pstree`. Try it out in the <button data-commandLinker-command=\"terminal:open\" data-name=\"flux\" href=\"#\">JupyterLab terminal</button> or here in the notebook.\n",
"\n",
"## Flux proxy\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"* PMIx (for OpenMPI)\n",
"* An in-memory content store (useful for preloading data into pods on cloud)\n",
"\n",
"When Flux starts, it launches one or more brokers across the resources it manages. By default, Flux will launch one broker per node, but this can be configured (e.g., with the `--test-size` flag to `flux start` shown in [Module 1](./01_flux_tutorial.ipynb)). After launching the brokers, Flux will designate one broker as the \"leader\" and the rest as \"followers\". The leader serves as entrypoint into the Flux instance, and it serves as the starting point for most Flux commands. The distribution of brokers and the \"leader-follower\" designations are shown in the following figure:\n",
"When Flux starts, it launches one or more brokers across the resources it manages. By default, Flux will launch one broker per node, but this can be configured (e.g., with the `--test-size` flag to `flux start` shown in [Chapter 1](./01_flux_tutorial.ipynb)). After launching the brokers, Flux will designate one broker as the \"leader\" and the rest as \"followers\". The leader serves as entrypoint into the Flux instance, and it serves as the starting point for most Flux commands. The distribution of brokers and the \"leader-follower\" designations are shown in the following figure:\n",
"\n",
"<figure>\n",
"<img src=\"img/flux-instance-pre-tbon.png\">\n",
Expand Down Expand Up @@ -155,7 +155,7 @@
"source": [
"### flux kvs\n",
"\n",
"One of the core services built into Flux is the key-value store (KVS). It is used in many other services, including most of Flux's resource management services, the `flux archive` service below, and DYAD (which we will explore in [Module 4](./04_dyad_dlio.ipynb)). These services use the KVS to persistantly store information and retrieve it later (potentially after a restart of Flux).\n",
"One of the core services built into Flux is the key-value store (KVS). It is used in many other services, including most of Flux's resource management services, the `flux archive` service below, and DYAD (which we will explore in [Supplementary Chapter 1](./supplementary/dyad/dyad_dlio.ipynb)). These services use the KVS to persistantly store information and retrieve it later (potentially after a restart of Flux).\n",
"\n",
"The `flux kvs` command provides a utility to list and manipulate values of the KVS. As a example of using `flux kvs`, let's use the command to examine information saved by the `resource` service."
]
Expand Down Expand Up @@ -264,7 +264,7 @@
"source": [
"Finally, note that `flux archive` was named `flux filemap` in earlier versions of Flux.\n",
"\n",
"`flux kvs` and `flux archive` are two useful, but simple exammples of Flux services. Flux also supports more complex services, including services for runtime data movement, such as DYAD (covered in [Module 4](./04_dyad_dlio.ipynb))."
"`flux kvs` and `flux archive` are two useful, but simple exammples of Flux services. Flux also supports more complex services, including services for runtime data movement, such as DYAD (covered in [Supplementary Chapter 1](./supplementary/dyad/dyad_dlio.ipynb))."
]
},
{
Expand All @@ -278,7 +278,7 @@
"2. How to start and stop services in Flux\n",
"3. Two useful services for users of Flux (i.e., `flux kvs` and `flux archive`)\n",
"\n",
"To finish the tutorial, open [Chapter 3](./03_flux_tutorial.ipynb)."
"To finish the tutorial, open [Chapter 3](./03_flux_tutorial_conclusions.ipynb)."
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"* Explained how to manage services with Flux\n",
"* Showed examples of Flux services\n",
"\n",
"If you are ready for advanced content, you can do the [DYAD and DLIO tutorial](./dyad_dlio.ipynb) and learn about:\n",
"If you are ready for advanced content, you can do the [DYAD and DLIO tutorial](./supplementary/dyad/dyad_dlio.ipynb) and learn about:\n",
"* Describing the design of DYAD, a Flux service for runtime data movement\n",
"* Introducing distributed Deep Learning (DL) training\n",
"* Introducing Argonne National Laboratory's Deep Learning I/O (DLIO) benchmark\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"tags": []
},
"source": [
"# Module 3: Using DYAD to accelerate distributed Deep Learning (DL) training\n",
"# Using DYAD to accelerate distributed Deep Learning (DL) training\n",
"\n",
"Now that we have seen how Flux enables the management and deployment of services, let's look at an example of using DYAD, an advanced Flux service for runtime data movement, in a real world application. Specifically, we will show how DYAD speeds up distributed Deep Learning (DL) training. In this module, we cover these topics:\n",
"1. Design of DYAD\n",
Expand Down

0 comments on commit 5a94458

Please sign in to comment.