-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification tool can error out from a divide by zero #638
Conversation
383b9fd
to
cc52398
Compare
… zero Signed-off-by: Kuhu Shukla <[email protected]>
cc52398
to
0bcacf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @kuhushukla !
@@ -574,7 +574,7 @@ def get_costs_for_single_app(df_row, estimator: SavingsEstimator) -> pd.Series: | |||
df_row['Estimated GPU Duration']) | |||
cpu_cost = (100 - self.ctxt.get_ctxt('cpu_discount')) / 100 * raw_cpu_cost | |||
gpu_cost = (100 - self.ctxt.get_ctxt('gpu_discount')) / 100 * raw_gpu_cost | |||
est_savings = 100.0 - ((100.0 * gpu_cost) / cpu_cost) | |||
est_savings = 100.0 - ((100.0 * gpu_cost) / cpu_cost) if cpu_cost > 0 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we additionally add a note/comment in the output saying that CPU cost calculation was 0 as that likely indicates an issue with provided cluster shape from the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. Let me add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kuhushukla
I think we need to look for what caused the CPU_COST to be zro in first place.
Then, put up a change that will disable cost-savings calculations when the CPU-cost was not available.
I have intuition that this happened because of using a CPU cluster that cannot be mapped to a GPU. See issue #637 and Pr #707
Somewhere in the qualification.py
we should call self.ctxt.set_ctxt('enableSavingsCalculations', false)
when the CPU to GPU cluster fails. For some reason, we do not detect that there is no mapping and we need to fix that for good.
I need to investigate deeper by reproducing the error.
Closing as we could not verify the root reason |
Fixes #637 , needs a unit test.