Skip to content

Commit

Permalink
Nested Workspaces template (#1322)
Browse files Browse the repository at this point in the history
* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

* code change

Signed-off-by: yes <[email protected]>

* code change

Signed-off-by: yes <[email protected]>

* model name change

Signed-off-by: yes <[email protected]>

* model name change

Signed-off-by: yes <[email protected]>

* model name change

Signed-off-by: yes <[email protected]>

* code changes

Signed-off-by: yes <[email protected]>

---------

Signed-off-by: yes <[email protected]>
  • Loading branch information
tanwarsh authored Feb 2, 2025
1 parent 1f43b93 commit 9409788
Show file tree
Hide file tree
Showing 153 changed files with 106 additions and 93 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/straggler-handling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,4 @@ jobs:
pip install .
- name: Test Straggler Handling Interface
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist_straggler_check --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
python -m tests.github.test_hello_federation --template torch/mnist_straggler_check --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
22 changes: 11 additions & 11 deletions .github/workflows/task_runner_basic_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ on:
type: choice
options:
- all
- torch_cnn_mnist
- keras_cnn_mnist
- torch/mnist
- keras/mnist
python_version:
description: "Python version"
required: false
Expand Down Expand Up @@ -85,21 +85,21 @@ jobs:
id: input_selection
run: |
# ---------------------------------------------------------------
# Models like XGBoost (xgb_higgs) and torch_cnn_histology require runners with higher memory and CPU to run.
# Models like XGBoost (xgb_higgs) and torch/histology require runners with higher memory and CPU to run.
# Thus these models are excluded from the matrix for now.
# Default combination if no input is provided (i.e. 'all' is selected).
# * TLS - models [torch_cnn_mnist, keras_cnn_mnist] and python versions [3.10, 3.11, 3.12]
# * Non-TLS - models [torch_cnn_mnist] and python version [3.10]
# * No client auth - models [keras_cnn_mnist] and python version [3.10]
# * Memory logs - models [torch_cnn_mnist] and python version [3.10]
# * TLS - models [torch/mnist, keras/mnist] and python versions [3.10, 3.11, 3.12]
# * Non-TLS - models [torch/mnist] and python version [3.10]
# * No client auth - models [keras/mnist] and python version [3.10]
# * Memory logs - models [torch/mnist] and python version [3.10]
# ---------------------------------------------------------------
echo "jobs_to_run=${{ env.JOBS_TO_RUN }}" >> "$GITHUB_OUTPUT"
if [ "${{ env.MODEL_NAME }}" == "all" ]; then
echo "models_for_tls=[\"torch_cnn_mnist\", \"keras_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"torch_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_no_client_auth=[\"keras_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_memory_logs=[\"torch_cnn_mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_tls=[\"torch/mnist\", \"keras/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"torch/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_no_client_auth=[\"keras/mnist\"]" >> "$GITHUB_OUTPUT"
echo "models_for_memory_logs=[\"torch/mnist\"]" >> "$GITHUB_OUTPUT"
else
echo "models_for_tls=[\"${{env.MODEL_NAME}}\"]" >> "$GITHUB_OUTPUT"
echo "models_for_non_tls=[\"${{env.MODEL_NAME}}\"]" >> "$GITHUB_OUTPUT"
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/task_runner_dockerized_ws_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10", "3.11", "3.12"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -73,7 +73,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -114,7 +114,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -155,7 +155,7 @@ jobs:
timeout-minutes: 15
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/task_runner_fedeval_dws_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'tls' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -102,7 +102,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'non_tls' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -145,7 +145,7 @@ jobs:
if: needs.input_selection.outputs.selected_jobs == 'no_client_auth' || needs.input_selection.outputs.selected_jobs == 'all'
strategy:
matrix:
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/task_runner_fedeval_e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Models like XGBoost (xgb_higgs) and torch_cnn_histology require runners with higher memory and CPU to run.
# Models like XGBoost (xgb_higgs) and torch/histology require runners with higher memory and CPU to run.
# Thus these models are excluded from the matrix for now.
model_name: ["torch_cnn_mnist", "keras_cnn_mnist"]
model_name: ["torch.mnist", "keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -77,9 +77,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Testing this scenario only for torch_cnn_mnist model and python 3.10
# Testing this scenario only for torch/mnist model and python 3.10
# If required, this can be extended to other models and python versions
model_name: ["torch_cnn_mnist"]
model_name: ["torch/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down Expand Up @@ -120,9 +120,9 @@ jobs:
timeout-minutes: 30
strategy:
matrix:
# Testing this scenario for keras_cnn_mnist model and python 3.10
# Testing this scenario for keras/mnist model and python 3.10
# If required, this can be extended to other models and python versions
model_name: ["keras_cnn_mnist"]
model_name: ["keras/mnist"]
python_version: ["3.10"]
fail-fast: false # do not immediately fail if one of the combinations fail

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/taskrunner.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ jobs:
pip install .
- name: Task Runner API
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist --fed_workspace aggregator --col1 collaborator1 --col2 collaborator2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template torch/mnist --fed_workspace aggregator --col1 collaborator1 --col2 collaborator2 --rounds-to-train 3 --save-model output_model
2 changes: 1 addition & 1 deletion .github/workflows/taskrunner_eden_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ jobs:
pip install .
- name: Test TaskRunner API with Eden Compression
run: |
python -m tests.github.test_hello_federation --template torch_cnn_mnist_eden_compression --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
python -m tests.github.test_hello_federation --template torch/mnist_eden_compression --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3
2 changes: 1 addition & 1 deletion .github/workflows/tr_docker_gramine_direct.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- name: Create workspace image
run: |
fx workspace create --prefix example_workspace --template keras_cnn_mnist
fx workspace create --prefix example_workspace --template keras/mnist
cd example_workspace
fx plan initialize -a localhost
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tr_docker_native.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- name: Create workspace image
run: |
fx workspace create --prefix example_workspace --template keras_cnn_mnist
fx workspace create --prefix example_workspace --template keras/mnist
cd example_workspace
fx plan initialize -a localhost
fx workspace dockerize --save --revision https://github.com/${GITHUB_REPOSITORY}.git@${{ github.event.pull_request.head.sha }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ubuntu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ jobs:
pip install .
- name: Test TaskRunner API
run: |
python -m tests.github.test_hello_federation --template keras_cnn_mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template keras/mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
2 changes: 1 addition & 1 deletion .github/workflows/windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,4 @@ jobs:
pip install .
- name: Test TaskRunner API
run: |
python -m tests.github.test_hello_federation --template keras_cnn_mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
python -m tests.github.test_hello_federation --template keras/mnist --fed_workspace aggregator --col1 col1 --col2 col2 --rounds-to-train 3 --save-model output_model
19 changes: 8 additions & 11 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
def snykData = [
'openfl-docker': 'openfl-docker/Dockerfile.base',
'openfl': 'setup.py',
'openfl-workspace_tf_2dunet': 'openfl-workspace/tf_2dunet/requirements.txt',
'openfl-workspace_torch_cnn_mnist_straggler_check': 'openfl-workspace/torch_cnn_mnist_straggler_check/requirements.txt',
'openfl-workspace_keras_2dunet': 'openfl-workspace/keras/2dunet/requirements.txt',
'openfl-workspace_torch_cnn_mnist_straggler_check': 'openfl-workspace/torch/mnist_straggler_check/requirements.txt',
// CN-14619 snyk test CLI does not support -f in requirements.txt file
// 'openfl-workspace_torch_cnn_histology': 'openfl-workspace/torch_cnn_histology/requirements.txt',
'openfl-workspace_torch_cnn_histology_src': 'openfl-workspace/torch_cnn_histology/src/requirements.txt',
'openfl-workspace_keras_nlp': 'openfl-workspace/keras_nlp/requirements.txt',
'openfl-workspace_torch_cnn_mnist': 'openfl-workspace/torch_cnn_mnist/requirements.txt',
'openfl-workspace_torch_unet_kvasir': 'openfl-workspace/torch_unet_kvasir/requirements.txt',
'openfl-workspace_tf_cnn_histology': 'openfl-workspace/tf_cnn_histology/requirements.txt',
'openfl-workspace_tf_3dunet_brats': 'openfl-workspace/tf_3dunet_brats/requirements.txt',
'openfl-workspace_keras_cnn_with_compression': 'openfl-workspace/keras_cnn_with_compression/requirements.txt',
'openfl-workspace_keras_cnn_mnist': 'openfl-workspace/keras_cnn_mnist/requirements.txt',
// 'openfl-workspace_keras/histology': 'openfl-workspace/torch/histology/requirements.txt',
'openfl-workspace_keras/histology_src': 'openfl-workspace/torch/histology/src/requirements.txt',
'openfl-workspace_keras/nlp': 'openfl-workspace/keras/nlp/requirements.txt',
'openfl-workspace_torch_cnn_mnist': 'openfl-workspace/torch/mnist/requirements.txt',
'openfl-workspace_torch_unet_kvasir': 'openfl-workspace/torch/unet_kvasir/requirements.txt',
'openfl-workspace_keras_cnn_mnist': 'openfl-workspace/keras/mnist/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_medmnist_2d_envoy': 'openfl-tutorials/interactive_api/PyTorch_MedMNIST_2D/envoy/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_dogscats_vit_workspace': 'openfl-tutorials/interactive_api/PyTorch_DogsCats_ViT/workspace/requirements.txt',
'openfl-tutorials_interactive_api_pytorch_histology_envoy': 'openfl-tutorials/interactive_api/PyTorch_Histology/envoy/requirements.txt',
Expand Down
8 changes: 4 additions & 4 deletions docs/about/features_index/fed_eval.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Example Using the Task Runner API (Aggregator-based Workflow)

The following steps can be leveraged to achieve practical e2e usage of FedEval

*N.B*: We will be using torch_cnn_mnist plan itself for both training and with some minor changes for evaluation as well
*N.B*: We will be using torch/mnist plan itself for both training and with some minor changes for evaluation as well

*Prerequisites*: Please ensure that OpenFL version==1.7 is installed or you can also choose to install latest from source.

Expand All @@ -48,13 +48,13 @@ With OpenFL version==1.7 aggregator start command is enhanced to have an optiona
--help Show this message and exit.
1. **Setup**
We will use the `torch_cnn_mnist` workspace for training
We will use the `torch/mnist` workspace for training

Let's first configure a workspace with all necesary certificates

.. code-block:: shell
fx workspace create --prefix ./cnn_train_eval --template torch_cnn_mnist
fx workspace create --prefix ./cnn_train_eval --template torch/mnist
cd cnn_train_eval
fx workspace certify
fx aggregator generate-cert-request
Expand Down Expand Up @@ -416,7 +416,7 @@ The updated plan post initialization with edits to make it ready for evaluation
metrics:
- loss
We have done following changes to the initialized torch_cnn_mnist plan in the new workspace:
We have done following changes to the initialized torch/mnist plan in the new workspace:
- Set the rounds_to_train to 1 as evaluation needs just one round of federation run across the collaborators
- Removed all other training related tasks from assigner settings except "aggregated_model_validation"
Now let's replace the ``init.pbuf`` with the previously saved ``trained_model.pbuf``
Expand Down
14 changes: 7 additions & 7 deletions docs/about/features_index/taskrunner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Each YAML top-level section contains the following subsections:

The following is an example of a **plan.yaml**:

.. literalinclude:: ../../../openfl-workspace/torch_cnn_mnist/plan/plan.yaml
.. literalinclude:: ../../../openfl-workspace/torch/mnist/plan/plan.yaml
:language: yaml


Expand Down Expand Up @@ -150,22 +150,22 @@ STEP 1: Create a Workspace
$ fx
2. This example uses the :code:`keras_cnn_mnist` template.
2. This example uses the :code:`keras/mnist` template.

Set the environment variables to use the :code:`keras_cnn_mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory.
Set the environment variables to use the :code:`keras/mnist` as the template and :code:`${HOME}/my_federation` as the path to the workspace directory.

.. code-block:: shell
$ export WORKSPACE_TEMPLATE=keras_cnn_mnist
$ export WORKSPACE_TEMPLATE=keras/mnist
$ export WORKSPACE_PATH=${HOME}/my_federation
3. Decide a workspace template, which are end-to-end federated learning training demonstrations. The following is a sample of available templates:

- :code:`keras_cnn_mnist`: a workspace with a simple `Keras <http://keras.io/>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`keras/mnist`: a workspace with a simple `Keras <http://keras.io/>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`tf_2dunet`: a workspace with a simple `TensorFlow <http://tensorflow.org>`__ CNN model that will use the `BraTS <https://www.med.upenn.edu/sbia/brats2017/data.html>`_ dataset and train in a federation.
- :code:`tf_cnn_histology`: a workspace with a simple `TensorFlow <http://tensorflow.org>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch_cnn_histology`: a workspace with a simple `PyTorch <http://pytorch.org/>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch_cnn_mnist`: a workspace with a simple `PyTorch <http://pytorch.org>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.
- :code:`keras/histology`: a workspace with a simple `PyTorch <http://pytorch.org/>`__ CNN model that will download the `Colorectal Histology <https://zenodo.org/record/53169#.XGZemKwzbmG>`_ dataset and train in a federation.
- :code:`torch/mnist`: a workspace with a simple `PyTorch <http://pytorch.org>`__ CNN model that will download the `MNIST <http://yann.lecun.com/exdb/mnist/>`_ dataset and train in a federation.

See the complete list of available templates.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ For logging through Tensorboard, enable the parameter :code:`write_logs : true`
settings :
write_logs : true
Follow the steps below to write your custom callback function instead. As an example, a full implementation can be found at `Federated_Pytorch_MNIST_Tutorial.ipynb <https://github.com/intel/openfl/blob/develop/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb>`_ and in the **torch_cnn_mnist** workspace.
Follow the steps below to write your custom callback function instead. As an example, a full implementation can be found at `Federated_Pytorch_MNIST_Tutorial.ipynb <https://github.com/intel/openfl/blob/develop/openfl-tutorials/Federated_Pytorch_MNIST_Tutorial.ipynb>`_ and in the **torch/mnist** workspace.

1. Define the callback function, like how you defined in Python API, in the **src** directory in your workspace.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The following are the straggler handling algorithms supported in OpenFL:
Demonstration of adding the straggler handling interface
=========================================================

The example template, **torch_cnn_mnist_straggler_check**, uses the ``PercentagePolicy``. To gain a better understanding of how experiments perform, you can modify the **percent_collaborators_needed** or **minimum_reporting** parameter in the template **plan.yaml** or even choose **CutoffTimePolicy** function instead:
The example template, **torch/mnist_straggler_check***, uses the ``PercentagePolicy``. To gain a better understanding of how experiments perform, you can modify the **percent_collaborators_needed** or **minimum_reporting** parameter in the template **plan.yaml** or even choose **CutoffTimePolicy** function instead:

.. code-block:: yaml
Expand Down
Loading

0 comments on commit 9409788

Please sign in to comment.