Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing new model pipelining #47

Open
3 of 4 tasks
rchan26 opened this issue Jun 7, 2023 · 4 comments
Open
3 of 4 tasks

Testing new model pipelining #47

rchan26 opened this issue Jun 7, 2023 · 4 comments
Assignees

Comments

@rchan26
Copy link
Member

rchan26 commented Jun 7, 2023

Look towards developing a demo notebook for a PyTorch (or another framework, e.g. JAX) implementation to illustrate flexibility of the pipeline.

  • Play around with implementing PyTorch equivalents to IceNetDataSet in icenet library
    • train.py and predict.py in the models/ part of the library use IceNetDataSet as an interface to tfrecorddatasets or an underlying dataloader: these are built around the tensorflow pipeline so could be rearchitectured down the line.
    • These data loaders and an IceNetDataSet suitably implementing the pytorch interface would suffice
    • Then develop demo notebook for alternate model usage, showing how IceNetDataSet and loaders can be used with pytorch
  • Consider implementation details for abstracting out the train / predict functionality for backend ML provider
  • Draft library implementation
  • Demonstrate with CLI alterations in the pipeline

PyTorch experimentation is a really good idea for flexing, other options like JAX are nice to haves but not urgent (pytorch is the best test as many people already want to use it)

It occurs to me that the pipeline interface in the long run should be able to largely detach from implementation specifics (other than in the templates for ensembling / CLI usage or ENVS setup)

A big job but definitely doable. Reach out as and when you need to @rchan26

@rchan26 rchan26 self-assigned this Jun 7, 2023
@JimCircadian
Copy link
Member

JimCircadian commented Jul 5, 2023

Dev catchup notes:

Can we have a reference implementation that includes a shell script in icenet-pipeline demonstrating a run and a icenet-notebooks implementation that demonstrates programmatic refitting of a pytorch model. 😉

Also noted that the quickest win to integrating a pytorch model is to use the dataloader directly (no harm in getting it via IceNetDataSet.get_data_loader), but IceNetDataSet / SplittingMixin.get_split_datasets class hierarchy should now be accounting for a variety of pipeline implementations. This just requires an initial refactor, followed by an alternate implementation supporting the pytorch use case.

Then the reference implementation replaces that within model/train.py and model/predict.py to use a pytorch model.

@JimCircadian
Copy link
Member

@rchan26 have moved this to @bnubald, hope you don't mind! :)

@bnubald
Copy link

bnubald commented Jan 1, 2024

Ongoing work can be found here while playing around.

Currently task 1 listed by Ryan should be covered by notebooks in above repo, even though is in its infancy.

Also, related issue #54

@bnubald
Copy link

bnubald commented Sep 6, 2024

Updated tasklist above, with following being reference library implementation with pytorch backend:

https://github.com/icenet-ai/icenet-gan

Integration with the icenet-pipeline is enabled by updating the predict and train yaml templates to point to new script files under the template directory, which runs the above library's train and predict CLI entrypoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

3 participants