New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: update do-not-disrupt description #6977

Open

jmdeal wants to merge 2 commits into aws:main from jmdeal:docs/do-not-disrupt-updates

Contributor

jmdeal commented Sep 10, 2024

Fixes #N/A

Description
Updates the description for karpenter.sh/do-not-disrupt to reflect the changes made when TGP was introduced.

How was this change tested?

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jmdeal requested a review from a team as a code owner

September 10, 2024 23:31

jmdeal requested a review from engedaam

September 10, 2024 23:31

netlify bot commented Sep 10, 2024 •

edited

Loading

✅ Deploy Preview for karpenter-docs-prod ready!

Name	Link
🔨 Latest commit	`007cee4`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/670847bfcf89ed000853f51c
😎 Deploy Preview	https://deploy-preview-6977--karpenter-docs-prod.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

coveralls commented Sep 10, 2024 •

edited

Loading

Pull Request Test Coverage Report for Build 11282313979

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 83.064%

Totals
Change from base Build 11279890012:	0.0%
Covered Lines:	5596
Relevant Lines:	6737

💛 - Coveralls

jonathan-innis reviewed

View reviewed changes

website/content/en/docs/upgrading/v1-migration.md Outdated Show resolved Hide resolved

website/content/en/docs/concepts/disruption.md Outdated Show resolved Hide resolved

website/content/en/docs/concepts/disruption.md

+              You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod.
+              You can treat this annotation as a single-node, permanently blocking PDB.
+              This has the following consequences:
+              - Nodes with `do-not-disrupt` pods will be excluded from **voluntary** disruption, i.e. [Consolidation]({{<ref "#consolidation" >}}) and [Drift]({{<ref "#drift" >}}).

Contributor

jonathan-innis Sep 11, 2024

Doesn't voluntary disruption include drift? I'm not sure that they are considered different things here

Contributor Author

jmdeal Oct 10, 2024

This just expands on what is voluntary disruption, it's not meant to differentiate between them. If you think that's unclear, I can try to reword.

website/content/en/docs/concepts/disruption.md Outdated Show resolved Hide resolved

website/content/en/docs/concepts/disruption.md Outdated

+              You can treat this annotation as a single-node, permanently blocking PDB.
+              This has the following consequences:
+              - Nodes with `do-not-disrupt` pods will be excluded from **voluntary** disruption, i.e. [Consolidation]({{<ref "#consolidation" >}}) and [Drift]({{<ref "#drift" >}}).
+              - Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).

Contributor

jonathan-innis Sep 11, 2024

Suggested change

      
            - Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).
          
            - Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}). These pods will either run to completion or be forcefully terminated when the node is near its terminationGracePeriod

Consider linking to terminationGracePeriod if you update the docs wording in this way too

Contributor Author

jmdeal Oct 10, 2024

I didn't take this exact suggestion, but I think the spirit of it is there now. Let me know what you think.

website/content/en/docs/concepts/disruption.md Outdated Show resolved Hide resolved

website/content/en/docs/concepts/disruption.md Outdated Show resolved Hide resolved

website/content/en/docs/concepts/disruption.md Outdated

-              This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached.
+              This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter.
+              Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached.

Contributor

jonathan-innis Sep 11, 2024

Suggested change

      
            Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached.
          
            Pods blocking eviction like PDBs and `do-not-disrupt` will block full draining until the `terminationGracePeriod` is reached.

website/content/en/docs/concepts/disruption.md Outdated


		For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish.
		For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`.

Contributor

jonathan-innis Sep 11, 2024

Suggested change

      
            For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`.
          
            For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining `23h` after its creation. The NodeClaim will then be allowed to drain for up to `1h` before its forcefully terminated from the cluster.

website/content/en/docs/concepts/disruption.md Outdated

-              For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish.
+              For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`.
+              Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds.

Contributor

jonathan-innis Sep 11, 2024

Suggested change

      
            Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds.
          
            Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds (`5m`).

jmdeal changed the title ~~docs: update do-not-disrupt description~~ [DRAFT] docs: update do-not-disrupt description

Contributor

github-actions bot commented Oct 2, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

github-actions bot added the lifecycle/stale label

jmdeal added 2 commits

October 10, 2024 14:31


          docs: update do-not-disrupt description

bc08587


          updates

007cee4

jmdeal force-pushed the docs/do-not-disrupt-updates branch from 18d505d to 007cee4 Compare

October 10, 2024 21:31

jmdeal changed the title ~~[DRAFT] docs: update do-not-disrupt description~~ docs: update do-not-disrupt description

github-actions bot removed the lifecycle/stale label

njtran reviewed

View reviewed changes

website/content/en/docs/concepts/disruption.md

-              You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines  the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup.
+              You can set a NodePool's `terminationGracePeriod` through the [`spec.template.spec.terminationGracePeriod`]({{<ref "../concepts/nodepools/#spectemplatespecterminationgraceperiod" >}}) field.
+              This is used to define the maximum drain duration for a given Node.
+              A node begins draining once it has been deleted, and it will be forcibly terminated once the `terminationGracePeriod` has elapsed.

Contributor

njtran Oct 14, 2024

I think this is definitely clearer than it was previously, but consider this

Suggested change

      
            A node begins draining once it has been deleted, and it will be forcibly terminated once the `terminationGracePeriod` has elapsed.
          
            A node begins draining once it has been deleted, and it will be forcibly terminated once the `terminationGracePeriod` has elapsed since it started draining.

njtran reviewed

View reviewed changes

website/content/en/docs/concepts/disruption.md

    
              {{% alert title="Note" color="primary" %}}

              Voluntary node removal does not include [Interruption]({{<ref "#interruption" >}}) or manual deletion initiated through `kubectl delete node`. Both of these are considered involuntary events, since node removal cannot be delayed.

              The `do-not-disrupt` annotation does **not** exclude nodes from involuntary disruption methods, i.e. [Expiration]({{<ref "#expiration" >}}), [Interruption]({{<ref "#interruption" >}}), and manual deletion (e.g. `kubectl delete node ...`).

Contributor

njtran Oct 14, 2024

To users who don't understand that involuntary disruption still goes through the termination controller might get confused on the language here.

Suggested change

      
            The `do-not-disrupt` annotation does **not** exclude nodes from involuntary disruption methods, i.e. [Expiration]({{<ref "#expiration" >}}), [Interruption]({{<ref "#interruption" >}}), and manual deletion (e.g. `kubectl delete node ...`).
          
            The `do-not-disrupt` annotation does **not** prevent nodes from being involuntarily disrupted, i.e. [Expiration]({{<ref "#expiration" >}}), [Interruption]({{<ref "#interruption" >}}), and manual deletion (e.g. `kubectl delete node ...`), but Karpenter will still respect `do-not-disrupt` pods when draining the node as previously mentioned.

njtran reviewed

View reviewed changes

website/content/en/docs/concepts/disruption.md

               {{% /alert %}}
               ### Node-Level Controls
-              You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. This will prevent disruption actions on the node.
+              You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node.
+              This will prevent voluntary disruption actions against the node.

Contributor

njtran Oct 14, 2024

Really only consolidation, not eventual. Worth thinking about how to make this distinction clearly and concisely.

njtran reviewed

View reviewed changes

website/content/en/docs/upgrading/upgrade-guide.md

                 * Karpenter's generated NodeConfig now takes precedence when generating UserData with the AL2023 `amiFamily`. If you're setting any values managed by Karpenter in your AL2023 UserData, configure these through Karpenter natively (e.g. kubelet configuration fields).
                 * Karpenter now adds a `karpenter.sh/unregistered:NoExecute` taint to nodes in injected UserData when using alias in AMISelectorTerms or non-Custom AMIFamily. When using `amiFamily: Custom`, users will need to add this taint into their UserData, where Karpenter will automatically remove it when provisioning nodes.
+                * Discovered standard AL2023 AMIs will no longer be considered compatible with GPU / accelerator workloads. If you're using an AL2023 EC2NodeClass (without AMISelectorTerms) for these workloads, you will need to select your AMI via AMISelectorTerms (non-alias).

Contributor

njtran Oct 14, 2024

It is now?

Contributor

github-actions bot commented Oct 29, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

github-actions bot added the lifecycle/stale label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/stale