Add configmap to copy example notebooks and scripts

Signed-off-by: Brandon Tuttle <[email protected]>
NVIDIA · Dec 18, 2024 · 1594654 · 1594654
1 parent 888bc74
commit 1594654
Show file tree

Hide file tree

Showing 13 changed files with 1,555 additions and 14 deletions.
diff --git a/examples/custom-values.yaml b/examples/custom-values.yaml
@@ -1,14 +1,14 @@
+image:
+  repository: nvcr.io/nvidia/nemo
+  tag: 24.12-rc0
+
 service:
   type: LoadBalancer
   port: 8888
 
-resources:
-  limits:
-    cpu: 1
-    memory: 2Gi
-  requests:
-    cpu: 500m
-    memory: 1Gi
+# resources:
+#   limits:
+#     nvidia.com/gpu: 1
 
 persistence:
   enabled: true
@@ -24,5 +24,5 @@ command:
   entrypoint: ["/bin/bash", "-c"]
   args:
     [
-      "jupyter lab --ServerApp.ip=0.0.0.0 --ServerApp.port=8888 --ServerApp.allow_root=True --ServerApp.token='' --ServerApp.password='' --no-browser",
+      "apt update && apt install graphviz -y && jupyter lab --ServerApp.ip=0.0.0.0 --ServerApp.port=8888 --ServerApp.allow_root=True --ServerApp.token='' --ServerApp.password='' --no-browser",
     ]
diff --git a/examples/k8s-hello-world/README.md b/examples/k8s-hello-world/README.md
@@ -0,0 +1,8 @@
+# Example Usage
+
+```sh
+helm upgrade --install nemo-run \
+    --namespace nemo-run \
+    --create-namespace k8s-hello-world \
+    -f custom-values.yaml
+```
diff --git a/examples/k8s-hello-world/files/hello_experiments.ipynb b/examples/k8s-hello-world/files/hello_experiments.ipynb
@@ -0,0 +1,281 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Hello NeMo-Run Experiments!\n",
+    "\n",
+    "This is the second part of our hello world tutorial series for NeMo-Run. Please make sure that you have gone through the [first part](hello_world.ipynb) beforehand, since this tutorial builds heavily on it.\n",
+    "\n",
+    "A key component of NeMo-Run is `run.Experiment`. For an introduction to `run.Experiment`, refer to its docstring, which is also posted below:\n",
+    "\n",
+    "`run.Experiment` is a context manager to launch and manage multiple runs using pure Python. It offers researchers with a simple and flexible way to create and manage their ML experiments. Building on the core components of NeMo-Run, `run.Experiment` can be used as an umbrella under which users can launch different configured functions across multiple remote clusters.\n",
+    "\n",
+    "The `run.Experiment` context manager takes care of storing the run metadata, launching it on the specified cluster, and syncing the logs and artifacts. Additionally, `run.Experiment` also provides management tools to easily inspect and reproduce past experiments.\n",
+    "Some of the use cases that it enables are listed below:\n",
+    "\n",
+    "1. Check the status and logs of a past experiment.\n",
+    "2. Reproduce a past experiment and rerun it.\n",
+    "3. Reconstruct a past experiment and relaunch it after some changes.\n",
+    "4. Compare different runs of the same experiment.\n",
+    "\n",
+    "This API allows users to programmatically define their experiments entirely in Python. To illustrate the flexibility it provides, here are some use cases that can be supported by `run.Experiment` with just a few lines of code.\n",
+    "\n",
+    "1. Launch a benchmarking run on different GPUs at the same time in parallel.\n",
+    "2. Launch a sequential data processing pipeline on a CPU heavy cluster.\n",
+    "3. Launch hyperparameter grid search runs on a single cluster in parallel.\n",
+    "4. Launch hyperparameter search runs distributed across all available clusters.\n",
+    "\n",
+    "The docstring also includes some code examples. In this tutorial, we build on `add_object` from the previous tutorial to define a simple experiment and show its capabilities.\n",
+    "\n",
+    "Let's get into it.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.\n",
+    "# SPDX-License-Identifier: Apache-2.0\n",
+    "#\n",
+    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "# you may not use this file except in compliance with the License.\n",
+    "# You may obtain a copy of the License at\n",
+    "#\n",
+    "# http://www.apache.org/licenses/LICENSE-2.0\n",
+    "#\n",
+    "# Unless required by applicable law or agreed to in writing, software\n",
+    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "# See the License for the specific language governing permissions and\n",
+    "# limitations under the License.\n",
+    "\n",
+    "# Set up and imports\n",
+    "import nemo_run as run\n",
+    "from simple.add import SomeObject, add_object, commonly_used_object"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Configure the Python Functions\n",
+    "\n",
+    "First, let's configure the functions we want to run in our experiments. You can configure multiple functions under an experiment. Here, we will configure two functions, which will be partials of `add_object` but with different parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fn_1 = run.Partial(\n",
+    "    add_object,\n",
+    "    obj_1=commonly_used_object(),\n",
+    "    obj_2=run.Config(SomeObject, value_1=10, value_2=20, value_3=30),\n",
+    ")\n",
+    "fn_1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fn_2 = run.Partial(\n",
+    "    add_object,\n",
+    "    # You can also pass in the argument directly instead of as a Config.\n",
+    "    # However, this will run any code inside the `__init__` or `__post_init__` methods of the classes (if its a class).\n",
+    "    obj_1=SomeObject(value_1=1000, value_2=1000, value_3=1000),\n",
+    "    obj_2=run.Config(SomeObject, value_1=10, value_2=20, value_3=30),\n",
+    ")\n",
+    "\n",
+    "fn_2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define and Run the Experiment\n",
+    "\n",
+    "Now, let's say we want to run these two configured functions together and manage them under an experiment. We can do so with just a few lines of code shown below. Try running it: it will launch the two tasks sequentially and wait for them to complete.\n",
+    "Notice that we set `sequential=True`, this is because parallel execution mode is not supported on the local executor as of now. This is intentional as launching parallel processes on your local workstation can quickly eat up your limited resources.\n",
+    "However, our `SlurmExecutor` supports parallel mode, (and is set to `True` by default). This will allow you to run both of your configured functions in parallel. An example is shown below:\n",
+    "\n",
+    "```python\n",
+    "with run.Experiment(\"add_object\", executor=run.LocalExecutor()) as exp:\n",
+    "    exp.add(fn_1, tail_logs=True)\n",
+    "    exp.add(fn_2, tail_logs=True)\n",
+    "    exp.run()\n",
+    "```\n",
+    "\n",
+    "Additionally, you can also launch the functions on separate executors as shown below:\n",
+    "\n",
+    "```python\n",
+    "with run.Experiment(\"add_object\", executor=run.LocalExecutor()) as exp:\n",
+    "    exp.add(fn_1, tail_logs=True)\n",
+    "\n",
+    "    exp.add(fn_2, executor=your_slurm_executor(), tail_logs=True)\n",
+    "    exp.run()\n",
+    "```\n",
+    "\n",
+    "The executor and configured functions are cloned in `exp.add` so you can mutate them as needed. This allows you to overwrite some parameters quickly. See the example below:\n",
+    "```python\n",
+    "with run.Experiment(\"add_object\", executor=run.LocalExecutor()) as exp:\n",
+    "    exp.add(fn_1, tail_logs=True)\n",
+    "\n",
+    "    fn_1.obj_1.value_1 = 0\n",
+    "    exp.add(fn_1, executor=your_slurm_executor(), tail_logs=True)\n",
+    "    exp.run()\n",
+    "```\n",
+    "\n",
+    ">📝 Currently, we only support sequential and parallel execution in an experiment. Directed Acyclic Graph (DAG) based execution is not yet supported.\n",
+    ">📝 To run the tasks in an experiment in parallel, all executors should support parallel mode as of now. We will relax this restriction soon.\n",
+    "\n",
+    ">📝 By default, the experiment metadata is stored in your home folder `~` inside the `.nemo_run` folder. However, you can also store it in a separate dir by setting the `NEMORUN_HOME` environment variable.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with run.Experiment(\"add_object\", executor=run.LocalExecutor()) as exp:\n",
+    "    exp.add(fn_1, tail_logs=True)\n",
+    "    exp.add(fn_2, tail_logs=True)\n",
+    "    exp.run()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Inspect the Experiment\n",
+    "\n",
+    "Additionally, you can also reconstruct and inspect an old experiment. There are a few utilities which allow you to list and inspect an experiment run. \n",
+    "Run the cells below to see the current management capabilities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# List all runs of an experiment\n",
+    "# The last suffix is the timestamp and results are sorted in ascending order of timestamps\n",
+    "run.Experiment.catalog(\"add_object\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Reconstruct an experiment and inspect its status, logs, etc\n",
+    "# if id is None, it will take the latest run.\n",
+    "# if id is provided, it will use that particular run.\n",
+    "# status and logs can be used outside the context manager too\n",
+    "with run.Experiment.from_title(\"add_object\") as exp:\n",
+    "    exp.status()\n",
+    "    exp.logs(job_id=\"simple.add.add_object\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a new run of an old experiment\n",
+    "exp = run.Experiment.from_title(\"add_object\")\n",
+    "with exp.reset():\n",
+    "    exp.tasks[0].obj_1 = exp.tasks[1].obj_1.clone()\n",
+    "    exp.run(sequential=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For more information on how to inspect and reproduce experiments, please refer to the [inspect experiment tutorial](../experiments/inspect-experiment.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualize the experiment configuration\n",
+    "exp"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualize tasks within the experiment\n",
+    "exp.tasks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Diff two experiments\n",
+    "old_exp = run.Experiment.from_id(run.Experiment.catalog(\"add_object\")[-2])\n",
+    "exp.diff(old_exp, trim=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp.tasks[0].diff(old_exp.tasks[0], trim=False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/examples/k8s-hello-world/files/hello_scripts.py b/examples/k8s-hello-world/files/hello_scripts.py
@@ -0,0 +1,42 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from simple.add import SomeObject, add_object, commonly_used_object
+
+import nemo_run as run
+
+# This script defines an experiment that invokes three tasks in parallel, two scripts and a run.Partial.
+# The example demonstrates how you can use scripts and run.Partial.
+if __name__ == "__main__":
+    script = run.Script("./scripts/echo.sh")
+    inline_script = run.Script(
+        inline="""
+env
+echo "Hello 1"
+echo "Hello 2"
+"""
+    )
+    fn = run.Partial(
+        add_object,
+        obj_1=commonly_used_object(),
+        obj_2=run.Config(SomeObject, value_1=10, value_2=20, value_3=30),
+    )
+    executor = run.LocalExecutor()
+
+    with run.Experiment("experiment_with_scripts", executor=executor, log_level="WARN") as exp:
+        exp.add(script, tail_logs=True)
+        exp.add(inline_script, tail_logs=True)
+        exp.add(fn, tail_logs=True)
+        exp.run(detach=False)