Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spot instances ephemeral disk size limit is not taken into consideration #518

Open
krukowskid opened this issue Oct 12, 2024 · 0 comments
Open
Labels
area/spot Issues or PRs related to spot area/storage Issues or PRs related to storage needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@krukowskid
Copy link

Version

Karpenter Version: v0.5.4

Kubernetes Version: v1.29.8

Expected Behavior

Karpenter should create spot instance with additional disk, or use smaller size to fir into VM limits.

Actual Behavior

It seems that karpenter is trying to create default 128Gb Ephemeral OS disk for spot instance and ends-up in create loop

Steps to Reproduce the Problem

  1. Create node pool with capacity-type spot and instance with max allowed ephemeral disk size less than 128GB like Standard_D4ds_v4
  2. Deploy workload to let karpenter provision new VM

Resource Specs and Logs

{
  "level": "ERROR",
  "time": "2024-10-12T14:39:39.784Z",
  "logger": "controller",
  "message": "Reconciler error",
  "commit": "846ef96",
  "controller": "nodeclaim.lifecycle",
  "controllerGroup": "karpenter.sh",
  "controllerKind": "NodeClaim",
  "NodeClaim": {
    "name": "spot-v7m6r"
  },
  "namespace": "",
  "name": "spot-v7m6r",
  "reconcileID": "8c31e447-0fda-4f74-8b96-b8acc11435e2",
  "error": "launching nodeclaim, creating instance, virtualMachine.BeginCreateOrUpdate for VM \"aks-spot-v7m6r\" failed: PUT https://management.azure.com/subscriptions/2dcb8032-974b-414f-ae22-f2f4aa2ffea6/resourceGroups/MC_akskarpenter_karpenter_swedencentral/providers/Microsoft.Compute/virtualMachines/aks-spot-v7m6r\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Bad Request\nERROR CODE: NotSupported\n--------------------------------------------------------------------------------\n{\n  \"error\": {\n    \"code\": \"NotSupported\",\n    \"message\": \"OS disk of Ephemeral VM with size greater than 100 GB is not allowed for VM size Standard_D4ds_v4 when the DiffDiskPlacement is CacheDisk. Please refer to https://aka.ms/Ephemeral for more details.\"\n  }\n}\n--------------------------------------------------------------------------------\n"
}

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@tallaxes tallaxes added area/storage Issues or PRs related to storage needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/spot Issues or PRs related to spot labels Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/spot Issues or PRs related to spot area/storage Issues or PRs related to storage needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants