cromwell_for_ML

Cromwell_for_ML leverages cromwell, WDL and Neptune to train Machine Learning model at scale and nicely log the results for nice visualization and comparison. Possible use cases include:

hyperparameter optimization
code development when multiple experiments are required.

In practice the solution boils down to running the command:

./submit_neptune_ml.sh neptune_ml.wdl WDL_parameters.json --ml ML_parameters.json

where:

submit_neptune_ml.sh is a wrapper around cromshell
neptune_ml.wdl is a WDL which specifies the following operations:
a. turn on a VM machine
b. checkout the correct version of the code from the github repository
c. launch the training of ML model
d. turn off the VM machine
WDL_parameters.json contains few parameters such as the name of the git repository, and commit to use
ML_parameters.json is a file with all the parameters necessary to specify the ML_model (learning_rate, etc)

In many situations the users should be able to only change the values in the WDL_parameters.json and ML_parameters.json to make the code run.

Setup

To work you need to install both cromshell and Neptune.

Neptune

Visit the website https://neptune.ai/ and sign-up for a free account (sign-up bottom is in the top-right)
Run the jupyter notebook AUXILIARY_FILES/TEST/test.ipynb
If notebook executes sucesfully, then Neptune is installed and workingly properly

Cromshell/Cromwell

Install cromshell (follow the instruction here: https://github.com/broadinstitute/cromshell)
If working remotely, connect to the Cisco NON Split-Tunnel VPN
Modify the file AUXILIARY_FILES/TEST/test.json to reflect your NEPTUNE_API_TOKEN and your NEPTUNE_PROJECT (use the same values you used in AUXILIARY_FILES/TEST/test_neptune.ipynb)
run the commands:

cd cromwell_for_ML/AUXILIARY_FILES/TEST
cromshell submit test.wdl test.json
cromshell list -u -c

You should see a list of all the runs submitted by cromshell. The last line should look like this: 6. repeat the command cromshell list -u -c till you see the job has completed. At that point log into the neptune website https://neptune.ai/ to see the results.

Cromshell and Neptune together

We are now going to use cromshell and Neptune to train a non-trivial ML model and log the results. The conceptual overview is:

Cromshell will start a google Virtual Machine (VM) and localize all relevant files from google buckets to the VM
on the VM we will checkout a github repo, and run the code python main.py which uses all the files we have localized to train a ML model
Neptune will log the metric
Cromshell turns of the VM

Preparation (one-time):

modify the first line of the file SUBMIT/ML_parameters.json to reflect your_neptune_username,
modify the file /SUBMIT/LOCALIZED_FILES/credentials.json by writing your own NEPTUNE_API_TOKEN
copy the files /SUBMIT/LOCALIZED_FILES/data_train.pt, /SUBMIT/LOCALIZED_FILES/data_test.pt and /SUBMIT/LOCALIZED_FILES/credentials.json to your own google bucket, i.e.:

gsutil -m cp SUBMIT/LOCALIZED_FILES/data_train.pt gs://my_bucket/data_train.pt
gsutil -m cp SUBMIT/LOCALIZED_FILES/data_test.pt gs://my_bucket/data_test.pt
gsutil -m cp SUBMIT/LOCALIZED_FILES/credentials.json gs://my_bucket/credentials.json

modify the file /SUBMIT/WDL_parameters.json to reflect the location where you copied the files data_train.pt, data_train.pt and credentials.json
modify the first line on the file /SUBMIT/submit_neptune_ml.sh to set your own google_bucket as the DEFAULT_BUCKET

Now we can finally train a ML model on the cloud and track all metrics using Neptune.

cd cromwell_for_ML/SUBMIT
./submit_neptune_ml.sh neptune_ml.wdl WDL_parameters.json --ml ML_parameters.json
cromshell list -u -c

The last row should list the run you just submitted and look like this (but listed as "Running" not "Succeded"):

Log into the Neptune website and see your results streaming in. After a while your results should look like this:

Congrats you have trained your first ML model using cromshell and Neptune

How to use cromwell_for_ML to train YOUR model

At the end of the day, you are going to run the command:

./submit_neptune_ml.sh neptune_ml.wdl WDL_parameters.json --ml ML_parameters.json

If you use the defaults file names, as in the line above, you can simply type the command:

./submit_neptune_ml.sh

The file neptune_ml.wdl describes all operations which will happen on the VM. Namely:

localization of files
checking out the correct version of the code
running the python code

You can freely modify this code. For example you might want to localize fewer files or run a different python command. Changes to neptune_ml.wdl might require changes to WDL_parameters.json. Run the command: \

submit_neptune_ml.sh neptune_ml.wdl -t

to see a template for the file WDL_parameters.json

The WDL_parameters.json contains:

the name of the git repository and commit you want to checkout
the locations of all files you want to localize from google buckets to VM machine. Among these file you always need the credentials.json (containing the NEPTUNE_API_TOKEN). You might or might not need the data_train.pt and data_test.pt files.

The ML_parameters.json contains all the parameters for training your ML model. It will be automagically appear on the VM machine. It is up to you to make sure that your code reads and makes good use of the file ML_parameters.json. It is also you responsability to make sure that your python/pytorch code makes calls to the neptune api to log the quantity of interest. You can see some examples of how to use these calls is:

AUXILIARY_FILES/TEMPLATE/template.ipynb
main.py

Usefull commands:

./submit_neptune_ml.sh neptune_ml.wdl WDL_parameters.json --ml ML_parameters.json --> submit a run using cromshell
./submit_neptune_ml.sh -------------------> submit a run using cromshell and the default file names
submit_neptune_ml.sh neptune_ml.wdl -t --> to see the template for the file WDL_parameters.json corresponding to the current version of the WDL file neptune_ml.wdl
cromshell list -c -u ------------------------> check the status of the submitted runs
cromshell metadata -----------------------> retrive the the metadata of the last run. In particular the location of all log files
cromshell status --------------------------> retrive the status of the last run \

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
AUXILIARY_FILES		AUXILIARY_FILES
MODULES		MODULES
SUBMIT		SUBMIT
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cromwell_for_ML

Setup

Neptune

Cromshell/Cromwell

Cromshell and Neptune together

Preparation (one-time):

How to use cromwell_for_ML to train YOUR model

Usefull commands:

About

Releases 1

Packages

Contributors 2

Languages

License

dalessioluca/cromwell_for_ML

Folders and files

Latest commit

History

Repository files navigation

cromwell_for_ML

Setup

Neptune

Cromshell/Cromwell

Cromshell and Neptune together

Preparation (one-time):

How to use cromwell_for_ML to train YOUR model

Usefull commands:

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages