Skip to content

Latest commit

 

History

History
62 lines (55 loc) · 3.57 KB

README.md

File metadata and controls

62 lines (55 loc) · 3.57 KB

ESX Controller

The ESX controller does two things for Kubernetes nodes running on virtual machines managed by a VMware vCenter. Firstly it regularly checks whether a nodes underlying ESX host is or goes into maintenance mode. If so the label cloud.sap/esx-in-maintenance is set to true.

Secondly, to complete entering maintenance mode all virtual machines on an ESX host need to be turned off. By setting the cloud.sap/esx-reboot-ok label to true on every node (within the cluster) belonging to certain ESX host, which is entering maintenance mode, the controller will cordon, drain and shutdown these nodes (and will keep them shutdown). When the ESX host leaves maintenance mode the controller will turn the nodes on and uncordon them. This behavior only occurs, if the cloud.sap/esx-reboot-initiated annotation is set to true, so it does not interfere with other maintenance activities. The cloud.sap/esx-reboot-initiated annotation is managed by the controller based on the cloud.sap/esx-in-maintenance and cloud.sap/esx-reboot-ok labels.

Using the cloud.sap/esx-in-maintenance label together with the cloud.sap/esx-reboot-ok label enables ESX maintenances to be managed flexibly with the "main" maintenance controller.

Certain alarms can be specified in the configuration file. If an ESX host has a triggered alarm with a name that matches the provided names in the configuration file, the cloud.sap/esx-in-maintenance label will be set to alarm. Draining nodes, which ESX maintenance state is alarm, will use deletions with a grace period of 0 (effectively force deleting these pods).

It is assumed that the nodes names equal the names of the hosting virtual machines. The availability zone within a cloud region is assumed to be the last character of the failure-domain.beta.kubernetes.io/zone label. The ESX hosts are to be tracked on relevant nodes using the kubernetes.cloud.sap/host label.

The nodes are also label with cloud.sap/esx-version containing the underlying ESX version.

Installation

The ESX controller is bundled within the maintenance controller binary. It needs to be enabled using the --enable-esx-maintenance flag.

Configuration

To be placed in ./config/esx.yaml.

intervals:
  # Defines how frequent the controller will check for ESX hosts entering maintenance mode
  check: # changing the check interval requires a pod restart to come into effect
    jitter: 0.1 # required
    period: 5m # required
  # Defines how long and frequent to check for pod deletions while draining
  podDeletion:
    period: 5s # required
    timeout: 2m # required
  # Defines how long to wait for a VM to shut down gracefully.
  # If a VM does not terminate within timeout it will be "unplugged the hard way"
  vmShutdown:
    period: 5s # required
    timeout: 2m # required
  # Defines how long and frequent to try to evict pods
  podEviction:
    period: 20s
    timeout: 5m
    force: false # If true and evictions do not succeed do normal DELETE calls
alarms:
  - "Host memory usage" # according to https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-82933270-1D72-4CF3-A1AF-E5A1343F62DE.html
vCenters:
  # Defines the urls to vCenters in different availability zones.
  # $AZ is replaced with the single character availability zone.
  templateUrl: https://some-vcenter-url-$AZ # required
  # Defines if a vCenters certificates should be checked
  insecure: # optional, defaults to false
  # Credentials for the vCenter per availability zone
  credentials: # required
    a:
      username: user # required
      password: pass # required