Skip to content

Commit

Permalink
docs: DOC-912: Add Maintenance Mode Documentation (#5100)
Browse files Browse the repository at this point in the history
* Initial commit for Maintenance Mode documentation

* Webp image conversion

* Minor updates

* Added procedure and validation to Disable Maintenance Mode section

* Minor cleanup; additional x-refs; affirmation that scans are intentionally disabled

* ci: auto-formatting prettier issues

* Optimised images with calibre/image-actions

* Optimised images with calibre/image-actions

* Minor style guide and spelling fixes

* ci: auto-formatting prettier issues

* Fixed spacing caused by merge conflict

* ci: auto-formatting prettier issues

* Spacing

* ci: auto-formatting prettier issues

* Incorporating suggestions from Carolina

---------

Co-authored-by: achuribooks <[email protected]>
Co-authored-by: vault-token-factory-spectrocloud[bot] <133815545+vault-token-factory-spectrocloud[bot]@users.noreply.github.com>
(cherry picked from commit 4844027)
  • Loading branch information
achuribooks committed Jan 8, 2025
1 parent a2c2c1b commit 2d3dfc7
Show file tree
Hide file tree
Showing 7 changed files with 155 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ specify the backup expiry period, meaning the duration after which Palette will
example, you can schedule a backup for every week on Sunday at midnight and automatically expire the backup after three
months. Additionally, you can initiate a backup on demand for an existing cluster.

## Limitations

- Nodes in [Maintenance Mode](../maintenance-mode.md) are not included in the backup process.

## Schedule a Backup

### Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,6 @@ The following sections describe these capabilities in detail:
individual users and clusters.

- [Image Swap](image-swap.md) - Learn how to use image swap capabilities with Palette.

- [Maintenance Mode](./maintenance-mode.md) - Turn off scheduling (cordon) and drain nodes, migrating workloads to other
healthy nodes in the cluster without service disruptions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ purposes. To learn more about each scan type, refer to the following sections.

:::info

Scans may not work as expected when a node is in maintenance mode. Before scheduling a scan, we recommend you turn off
maintenance mode if enabled. To verify if a node is in maintenance mode, navigate to **Clusters** > **Nodes** and check
the **Health** column for a **Maintenance mode** icon. To turn off maintenance mode, click on the **three-dot Menu** in
the row of the node you want to scan, and select **Turn off maintenance mode**.
Scans cannot be performed when a node is in [maintenance mode](./maintenance-mode.md). To verify if a node is in
maintenance mode, navigate to **Clusters** > **Nodes** and check the **Health** column for a **Maintenance mode** icon.
To turn off maintenance mode, click on the **three-dot Menu** in the row of the node you want to scan, and select **Turn
off maintenance mode**.

:::

Expand Down
136 changes: 136 additions & 0 deletions docs/docs-content/clusters/cluster-management/maintenance-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
sidebar_label: "Maintenance Mode"
title: "Maintenance Mode"
description: "Learn how to enable and use maintenance mode to cordon and drain nodes."
hide_table_of_contents: false
sidebar_position: 240
tags: ["clusters", "cluster management"]
---

Similar to `kubectl` commands `cordon` and `drain`, maintenance mode allows you to temporarily disable scheduling for an
active control plane or worker node. When a node is placed in maintenance mode, workloads are migrated automatically to
other healthy nodes in the cluster without services being disrupted. Using maintenance mode makes it easier to perform
necessary maintenance tasks, address node issues, and optimize workload distribution while maintaining the desired level
of performance and availability.

## Prerequistes

- An active Palette host cluster with more than one control plane node and worker node.

- Alternate nodes with sufficient resources available where processes from maintenance nodes can be provisioned.

## Limitations

<!-- prettier-ignore -->
- Static pods and DaemonSets are not evicted from the node when activating maintenance mode.

- Scans cannot be performed on the cluster when any node in the cluster is in maintenance mode.

- Nodes in maintenance mode are not included in the backup process, which also means they cannot be restored.

- Changes to add-on profiles are not applied to nodes in maintenance mode.

- Certain changes to infrastructure profiles, such as Kubernetes version upgrades, require nodes to be recreated,
removing maintenance nodes in the process.

## Activate Maintenance Mode

<!-- prettier-ignore -->
1. Log in to [Palette](https://console.spectrocloud.com).

2. Navigate to the left **Main Menu** and select **Clusters**.

3. Select the desired cluster and navigate to the **Nodes** tab of the cluster.

4. Beside the node that needs maintenance, select the **three-dot Menu** and **Turn on maintenance mode**.

5. When maintenance mode is activated, the **Health** icon changes to a set of tools, and the tooltip states
**Maintenance Mode: Initiated**. When Maintenance Mode is finished, the tooltip changes to **Maintenance Mode:
Complete**.

Palette reminds you in several locations that you have a node in maintenance mode:

- Beside the **Settings** drop-down while viewing your cluster.

- On the cluster’s **Overview** tab beneath **Health** status.

- On the cluster’s **Nodes** tab in the node’s **Health** column.

![Node in maintenance mode](/clusters_cluster-management_maintenance_mode.webp)

### Validate

1. Log in to [Palette](https://console.spectrocloud.com).

2. Navigate to the left **Main Menu** and select **Clusters**.

3. Select the cluster with maintenance mode active and download the [kubeconfig](./palette-webctl.md) file.

![The cluster details page with the two kubeconfig files elements highlighted](/clusters_cluster--management_kubeconfig_cluster-details-kubeconfig-files.webp)

4. Open a terminal window and set the environment variable `KUBECONFIG` to point to the kubeconfig file you downloaded.

```bash
export KUBECONFIG=~/Downloads/admin.aws-maintenance-test.kubeconfig
```

5. Confirm that the node is in a maintenance state, indicated by a `STATUS` of `SchedulingDisabled`.

```bash
kubectl get nodes
```

```bash hideClipboard {4}
NAME STATUS ROLES AGE VERSION
ip-10-0-1-174.ec2.internal Ready control-plane 177m v1.30.6
ip-10-0-1-26.ec2.internal Ready <none> 174m v1.30.6
ip-10-0-1-235.ec2.internal Ready,SchedulingDisabled <none> 174m v1.30.6
```

## Disable Maintenance Mode

<!-- prettier-ignore -->
1. Log in to [Palette](https://console.spectrocloud.com).

2. Navigate to the left **Main Menu** and select **Clusters**.

3. Select the desired cluster and navigate to the **Nodes** tab of the cluster.

4. Select the **three-dot Menu** beside the maintenance node and **Turn off maintenance mode**.

5. When maintenance mode is disabled, the **Health** icon reverts to a checkmark.

:::warning

Taking a node out of maintenance mode does not automatically rebalance workloads.

:::

### Validate

1. Log in to [Palette](https://console.spectrocloud.com).

2. Navigate to the left **Main Menu** and select **Clusters**.

3. Select the desired cluster and download the [kubeconfig](./palette-webctl.md) file.

![The cluster details page with the two kubeconfig files elements highlighted](/clusters_cluster--management_kubeconfig_cluster-details-kubeconfig-files.webp)

4. Open a terminal window and set the environment variable `KUBECONFIG` to point to the kubeconfig file you downloaded.

```bash
export KUBECONFIG=~/Downloads/admin.aws-maintenance-test.kubeconfig
```

5. Confirm that scheduling is no longer disabled for the node, indicated by a `STATUS` of `Ready`.

```bash
kubectl get nodes
```

```bash hideClipboard
NAME STATUS ROLES AGE VERSION
ip-10-0-1-174.ec2.internal Ready control-plane 177m v1.30.6
ip-10-0-1-26.ec2.internal Ready <none> 174m v1.30.6
ip-10-0-1-235.ec2.internal Ready <none> 174m v1.30.6
```
2 changes: 1 addition & 1 deletion docs/docs-content/vm-management/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ For more detailed information about the technical architecture of VMO, refer to
By default, Palette VMO includes the following components:

- **Descheduler**. Enables VM live migration to different nodes in the node pool when the original node is in
maintenance mode.
[maintenance mode](../clusters/cluster-management/maintenance-mode.md).

- **Snapshot Controller**. Enables you to create VM snapshots. This component is automatically installed when you
initiate or schedule cluster backups.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,11 @@ Follow the instructions below to migrate VMs to a different node.

## Evacuate a Host

Compute nodes can be placed into maintenance mode using Palette or manually using the `cordon` and `drain` commands. The
`cordon` command marks the node as un-schedulable and the `drain`command evacuates all the VMs and pods from it. This
process is useful in case you need to perform hardware maintenance on the node - for example to replace a disk or
network interface card (NIC) card, perform memory maintenance, or if there are any issues with a particular node that
need to be resolved. To learn more, check out the
Compute nodes can be placed into [maintenance mode](../../clusters/cluster-management/maintenance-mode.md) using Palette
or manually using the `cordon` and `drain` commands. The `cordon` command marks the node as un-schedulable and the
`drain` command evacuates all the VMs and pods from it. This process is useful in case you need to perform hardware
maintenance on the node - for example to replace a disk or network interface card (NIC) card, perform memory
maintenance, or if there are any issues with a particular node that need to be resolved. To learn more, check out the
[Safely Drain a Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#use-kubectl-drain-to-remove-a-node-from-service)
Kubernetes resource.

Expand Down Expand Up @@ -173,3 +173,5 @@ You can validate evacuation completed by following the steps below.
- [Persistent Volume Access Modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)

- [Safely Drain a Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#use-kubectl-drain-to-remove-a-node-from-service)

- [Maintenance Mode](../../clusters/cluster-management/maintenance-mode.md)
Binary file not shown.

0 comments on commit 2d3dfc7

Please sign in to comment.