Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspaces for different backends of Keras #1242

Draft
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

tanwarsh
Copy link
Collaborator

@tanwarsh tanwarsh commented Jan 6, 2025

The PR includes the following changes:

  1. Restructured Keras workspaces.
  2. Added NLP workspaces for PyTorch, JAX, and TensorFlow backend with Keras.
  3. Modified template name logic to search for templates in nested directories in openfl-workspaces.

@tanwarsh tanwarsh marked this pull request as draft January 6, 2025 05:33
Signed-off-by: yes <[email protected]>
@tanwarsh tanwarsh marked this pull request as ready for review January 6, 2025 09:10
@tanwarsh tanwarsh requested a review from ishaileshpant January 7, 2025 06:58
Copy link
Collaborator

@kta-intel kta-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started @tanwarsh !

My biggest comment here (and I'd like to hear your thoughts on this) is that the JAX/torch workspaces don't seem to showcase JAX or torch. OpenFL offers the ability to define custom training and evaluation loop via train_ and validate_ methods of the runner. Keras is able to support custom training loops written in JAX and torch that can potentially preserve a lot of each framework's native API in a similar manner. See JAX / Torch.

My personal opinion, I think it would be more compelling to showcase OpenFL's framework flexibility in this manner. WDYT?

@@ -0,0 +1,2 @@
keras==3.6.0
jax==0.4.38
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like JAX is ever actually explicitly used throughout the experiment

@@ -0,0 +1,2 @@
keras==3.6.0
torch==2.5.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here - it doesn't seem like torch is ever explicitly used in the experiment

@@ -17,6 +19,14 @@
from openfl.utilities import Metric, TensorKey, change_tags
from openfl.utilities.split import split_tensor_dict_for_holdouts

# Set the KERAS_BACKEND environment variable based on the available deep learning framework
Copy link
Collaborator

@kta-intel kta-intel Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to add a "backend" argument to the keras runner (that could maybe be configured in the plan.yaml). Rationale being that a user may have an env that contains multiple frameworks. Idk if that's a super common scenario, but the runner would end up setting the backend to the first one it encounters rather than the intended backend resulting in an explicit error or some unintended consequences

@tanwarsh
Copy link
Collaborator Author

tanwarsh commented Jan 8, 2025

Thanks for getting this started @tanwarsh !

My biggest comment here (and I'd like to hear your thoughts on this) is that the JAX/torch workspaces don't seem to showcase JAX or torch. OpenFL offers the ability to define custom training and evaluation loop via train_ and validate_ methods of the runner. Keras is able to support custom training loops written in JAX and torch that can potentially preserve a lot of each framework's native API in a similar manner. See JAX / Torch.

My personal opinion, I think it would be more compelling to showcase OpenFL's framework flexibility in this manner. WDYT?

I agree with you @kta-intel. The changes in this PR only enable different backends for Keras, but the workspaces do not showcase JAX, Torch, or TensorFlow. It would definitely be compelling to showcase JAX and Torch with Keras. I will work on this.

@rahulga1
Copy link
Collaborator

rahulga1 commented Jan 8, 2025

Thanks for getting this started @tanwarsh !
My biggest comment here (and I'd like to hear your thoughts on this) is that the JAX/torch workspaces don't seem to showcase JAX or torch. OpenFL offers the ability to define custom training and evaluation loop via train_ and validate_ methods of the runner. Keras is able to support custom training loops written in JAX and torch that can potentially preserve a lot of each framework's native API in a similar manner. See JAX / Torch.
My personal opinion, I think it would be more compelling to showcase OpenFL's framework flexibility in this manner. WDYT?

I agree with you @kta-intel. The changes in this PR only enable different backends for Keras, but the workspaces do not showcase JAX, Torch, or TensorFlow. It would definitely be compelling to showcase JAX and Torch with Keras. I will work on this.

But will it make sense, to merge this which creates the building block and create a separate PR to bring actual example?

Copy link
Collaborator

@MasterSkepticista MasterSkepticista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echoing @kta-intel's comment, three separate tutorials with different backends do not justify the extra lines of code we have to maintain in this repo.

My suggestions, in no particular order:

  1. Group tutorials by use-case under keras: keras/{mnist,nlp}. Backends should be switchable by installing necessary packages (jax, tf or torch). This would go as a multi-backend CI test for OpenFL.
  2. Group tutorials by framework; {jax,torch}. These will target custom training loops in pure DL framework code, utilizing their respective taskrunner classes. This can be a separate PR.

@tanwarsh tanwarsh marked this pull request as draft January 16, 2025 05:16
@payalcha
Copy link
Contributor

payalcha commented Jan 20, 2025

@tanwarsh These changes seems to impact task_runner_e2e workflow tests. I think changing tests.end_to_end.utils.constants.py enum will work. But please run the E2E test separately to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants