-
Notifications
You must be signed in to change notification settings - Fork 16
MLOps Best Practices
At Task-TS we aim to set broadly applicable standards to versioning, tracking, deploying, monitoring, and re-training models. Moreover, we want to show how software engineering and DevOps best practices can be synthesized with cutting edge ML-Research.
Reproducible Results
In order for experiments to be reproducible three major things need to be met:
-
Data Versioning
a. Store all data on a daily basis to GCS and Dataverse.
b. Maintain historical versioned snapshots of data.
c. Allow direct loading into flow from a historical snapshot.
-
Experiment tracking and CD
-
Code/Config versioning
a. Store all experiment code to GitHub and tag each experiment with a commit hash.
b. Store full notebooks to Github.
c. Store configuration files themselves.
Extendibility Best Practices
- Utilize swappable configuration files. Everything should be configurable as a parameter without having to change base code.
- Allow ensemble of models through JSON configuration files.