Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct typos #95

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
# estafette-gke-preemptible-killer

This small Kubernetes application loop through a given preemptibles node pool and kill a node before the regular [24h
This small Kubernetes application loops through a given preemptibles node pool and kills a node before the regular [24hr
life time of a preemptible VM](https://cloud.google.com/compute/docs/instances/preemptible#limitations).

[![License](https://img.shields.io/github/license/estafette/estafette-gke-preemptible-killer.svg)](https://github.com/estafette/estafette-gke-preemptible-killer/blob/master/LICENSE)

## Why?

When creating a cluster, all the node are created at the same time and should be deleted after 24h of activity. To
When creating a cluster, all nodes are created at the same time and would be deleted after 24 hours of activity. To
prevent large disruption, the estafette-gke-preemptible-killer can be used to kill instances during a random period
of time between 12 and 24h. It makes use of the node annotation to store the time to kill value.
of time between 12 and 24h. It makes use of node annotation to store the time-to-kill value.

## How does that work
## How does it work?

At a given interval, the application get the list of preemptible nodes and check weither the node should be
At a given interval, the application gets the list of preemptible nodes and checks whether the node should be
deleted or not. If the annotation doesn't exist, a time to kill value is added to the node annotation with a
random range between 12h and 24h based on the node creation time stamp.
When the time to kill time is passed, the Kubernetes node is marked as unschedulable, drained and the instance
When the time-to-kill time is passed, the Kubernetes node is marked as unschedulable, drained and the instance
deleted on GCloud.

## Known limitations

- Selecting node pool is not supported yet, the code is processing ALL
- Selecting node pool is not supported yet. The code processes ALL
preemptible nodes attached to the cluster, and there is no way to limit it
even via taints nor annotations
even via taints or annotations
- This tool increases the chances to have many small disruptions instead of
one major disruption.
- This tool does not guarantee that major disruption is avoided - GCP can
trigger large disruption because the way preemptible instances are managed.
Ensure your have PDB and enough of replicas, so for better safety just use
trigger large disruptions because of the way preemptible instances are managed.
Ensure you have PDB and enough replicas, or for better safety just use
non-preemptible nodes in different zones. You may also be interested in [estafette-gke-node-pool-shifter](https://github.com/estafette/estafette-gke-node-pool-shifter)

## Usage
Expand All @@ -37,14 +37,14 @@ You can either use environment variables or flags to configure the following set

| Environment variable | Flag | Default | Description
| ---------------------- | ------------------------ | -------- | -----------------------------------------------------------------
| BLACKLIST_HOURS | --blacklist-hours (-b) | | List of UTC time intervals in the form of `09:00 - 12:00, 13:00 - 18:00` in which deletion is NOT allowed
| DRAIN_TIMEOUT | --drain-timeout | 300 | Max time in second to wait before deleting a node
| BLACKLIST_HOURS | --blacklist-hours (-b) | | List of UTC time intervals in the form of `09:00 - 12:00, 13:00 - 18:00` during which deletion is NOT allowed
| DRAIN_TIMEOUT | --drain-timeout | 300 | Max time in seconds to wait before deleting a node
| FILTERS | --filters (-f) | | Label filters in the form of `key1: value1[, value2[, ...]][; key2: value3[, value4[, ...]], ...]`
| INTERVAL | --interval (-i) | 600 | Time in second to wait between each node check
| KUBECONFIG | --kubeconfig | | Provide the path to the kube config path, usually located in ~/.kube/config. This argument is only needed if you're running the killer outside of your k8s cluster
| METRICS_LISTEN_ADDRESS | --metrics-listen-address | :9001 | The address to listen on for Prometheus metrics requests
| METRICS_PATH | --metrics-path | /metrics | The path to listen for Prometheus metrics requests
| WHITELIST_HOURS | --whitelist-hours (-w) | | List of UTC time intervals in the form of `09:00 - 12:00, 13:00 - 18:00` in which deletion is allowed and preferred
| INTERVAL | --interval (-i) | 600 | Time in seconds to wait between each node check
| KUBECONFIG | --kubeconfig | | Kube config path, usually located in ~/.kube/config. This argument is only needed if you're running the killer outside of your k8s cluster
| METRICS_LISTEN_ADDRESS | --metrics-listen-address | :9001 | Address to listen on for Prometheus metrics requests
| METRICS_PATH | --metrics-path | /metrics | Path to listen for Prometheus metrics requests
| WHITELIST_HOURS | --whitelist-hours (-w) | | List of UTC time intervals in the form of `09:00 - 12:00, 13:00 - 18:00` during which deletion is allowed and preferred

### Create a Google Service Account

Expand Down Expand Up @@ -119,14 +119,14 @@ kubectl apply -k .

## Development

To start development run
To start development, run:

```bash
git clone [email protected]:estafette/estafette-ci-api.git
cd estafette-ci-api
```

Before committing your changes run
Before committing your changes, run:

```bash
go test ./...
Expand Down Expand Up @@ -154,5 +154,5 @@ For an all-in-one script that launches a kind cluster with 3 nodes, runs
go build && ./scripts/all-in-one-test -i 10
```
where `-i 10` are the arguments to be passed to
`estafette-gke-preemptible-killer`, replace with your own test arguments.
For safety, it does not remove the kind cluster it leaves behind.
`estafette-gke-preemptible-killer`. Replace with your own test arguments.
For safety, this does not remove the kind cluster it leaves behind.