Proper way to model an ML testing pipeline #27589
Unanswered
mielkec-gene
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Our use case : we daily train new models that are archived on a different platform, and we wish to download and test each model on dozens of downstream tasks. Some of these tasks are simple, but many others require horizontal scaling across a dozen or so machines.
I'm relatively new to dagster, so the right way of modeling this has been somewhat unclear. I did eventually figure out that a Dynamic partition was likely the best way to represent new models as they arrive, instead of defining them as new assets and checking in a new code version....
I needed to have a job that could programmatically create a new partition, and create the model asset associated with it. After dabbling for a day, I finally got the following pattern working :
There must be a simpler way then this, right? I went about asset factories for a time, but it seems that those are meant to be executed only when the code loads, and not during pipeline execution. Im using a @graph here because it appears to allow me to use the launchpad to launch a new validation job from the UI with the model_id passed as the entire pipeline input, even though its really only needed for the model asset itself.
I'm assuming Ive created some antipatterns here, but I just can't find a simpler way to accomplish dynamic asset creation.
Beta Was this translation helpful? Give feedback.
All reactions