Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidation with spot by default is not appropriate #1605

Open
leoryu opened this issue Aug 28, 2024 · 2 comments · May be fixed by #1649
Open

Consolidation with spot by default is not appropriate #1605

leoryu opened this issue Aug 28, 2024 · 2 comments · May be fixed by #1649
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@leoryu
Copy link

leoryu commented Aug 28, 2024

Description

Observed Behavior:

From :

// We are consolidating a node from OD -> [OD,Spot] but have filtered the instance types by cost based on the
// assumption, that the spot variant will launch. We also need to add a requirement to the node to ensure that if
// spot capacity is insufficient we don't replace the node with a more expensive on-demand node. Instead the launch
// should fail and we'll just leave the node alone.
ctReq := results.NewNodeClaims[0].Requirements.Get(v1.CapacityTypeLabelKey)
if ctReq.Has(v1.CapacityTypeSpot) && ctReq.Has(v1.CapacityTypeOnDemand) {
results.NewNodeClaims[0].Requirements.Add(scheduling.NewRequirement(v1.CapacityTypeLabelKey, corev1.NodeSelectorOpIn, v1.CapacityTypeSpot))
}
return Command{
candidates: candidates,
replacements: results.NewNodeClaims,
}, results, nil
}

Karpenter will set nodeclaim's CapacityType with spot if reqs allow CapacityType with [OD, spot].

This logic will let karpenter always creates spot machine, even the cheapest machine is OD but not the spot one.

And the worst case is that there is no spot machine avaliabel, the karpenter wiill report err:

logger":"controller","message":"failed launching nodeclaim","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim": ...

Since the created nodeclaim has reqs with 'spot', the consolidation will not sucess even we have cheaper OD machine.

Expected Behavior:

What I expected in consolidation is that:

Do not modify the CapacityType, just choose the chpeast machine if my reqs has ignore the capacity type.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@leoryu leoryu added the kind/bug Categorizes issue or PR as related to a bug. label Aug 28, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 28, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@leoryu leoryu changed the title Consolidation with spot by default is appropriate Consolidation with spot by default is not appropriate Aug 28, 2024
@leoryu
Copy link
Author

leoryu commented Sep 6, 2024

@njtran Hi, I found the code is commit by you 2 yeases ago. Could you explain why nodeclaim is always spot in consolidation? Since the real world spot machine might not be available, I think karpenter should choose the cheapeast one, even the machine is on-demand.

@leoryu leoryu linked a pull request Sep 10, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants