Terraform Kubernetes Infrastructure as Code (IAC)
This repository was created by following the instructions in the article linked below, with modifications to suit my specific cluster configuration. Note that the datasets or datastores on my Proxmox setup may differ from yours, so adjust accordingly. My Proxmox cluster consists of three nodes and uses Ceph for efficient virtual machine management across all nodes.
Article: Talos Cluster on Proxmox with Terraform by Olav
This repository provides Infrastructure as Code (IaC) for deploying and managing a Kubernetes cluster on Proxmox using Talos and Terraform. It is designed for repeatable, automated, and declarative cluster management.
- Declarative VM Provisioning: Proxmox VMs for control plane and worker nodes are managed via Terraform.
- Custom Talos Image: Uses Talos Image Factory with baked-in system extensions (iSCSI tools, util-linux tools) for persistent storage support.
- Talos OS & Kubernetes Versioning: Talos and Kubernetes versions are parameterized in
variables.tffor easy upgrades. - Automated Cluster Configuration: Talos machine configurations are generated and applied automatically to each node.
- Longhorn Persistent Storage: Distributed block storage with NodePort UI access (port 30080) for volume management.
- Rolling Upgrades: Change a version variable and apply to safely upgrade Talos and/or Kubernetes across your cluster.
- CI/CD Linting: A GitHub Actions workflow automatically checks Terraform formatting and lints code on pull requests and pushes to
main.
- Edit the version variables in
variables.tf:variable "talos_version" { type = string default = "v1.11.5" } variable "kubernetes_version" { type = string default = "1.34.0" }
- Run:
This triggers a rolling upgrade of your cluster nodes using the new versions.
terraform apply
.
├── cluster.tf # Talos cluster and machine configuration resources
├── files.tf # Talos custom image download with system extensions
├── providers.tf # Terraform provider configuration
├── variables.tf # All input variables, including versioning
├── virtual_machines.tf # Proxmox VM definitions for control plane and workers
├── longhorn-values.yaml # Helm values for Longhorn persistent storage
├── LONGHORN_TALOS_SETUP.md # Longhorn setup guide and troubleshooting
├── .github/workflows/terraform-lint.yml # CI workflow for linting
└── README.md # Project documentation
- Terraform Linting:
On every PR or push tomain, the.github/workflows/terraform-lint.ymlworkflow runs:terraform fmt -check -recursivetflint --recursiveto ensure code quality and consistency.
This cluster uses a custom Talos image built via the Talos Image Factory with the following system extensions baked in:
- iscsi-tools (v0.2.0): Provides iSCSI initiator support for persistent storage
- util-linux-tools (2.41.1): Additional Linux utilities for storage management
- qemu-guest-agent (10.0.2): Enhanced VM integration with Proxmox
Schematic ID: e187c9b90f773cd8c84e5a3265c5554ee787b2fe67b508d9f955e90e7ae8c96c
These extensions enable Longhorn to properly manage persistent volumes on Talos nodes. The ext-iscsid service runs automatically on all nodes.
Check that extensions are installed on all nodes:
talosctl get extensions --nodes 10.0.0.70,10.0.0.71,10.0.0.72,10.0.0.73,10.0.0.74,10.0.0.75Verify iSCSI service is running:
talosctl get services --nodes 10.0.0.73,10.0.0.74,10.0.0.75 | grep ext-iscsidThis cluster includes Longhorn for distributed block storage across worker nodes.
- Distributed Storage: Replicated volumes across multiple nodes for high availability
- Dynamic Provisioning: Automatic PersistentVolume creation via StorageClasses
- Web UI: Accessible via NodePort on port 30080 (http://NODE_IP:30080)
- Backup & Restore: Volume snapshots and backup capabilities
- Pod Security: Configured with privileged permissions in
longhorn-systemnamespace
Longhorn is deployed using Helm with custom values:
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--values longhorn-values.yamlLonghorn requires privileged pod security. Apply labels to the namespace:
kubectl label namespace longhorn-system \
pod-security.kubernetes.io/enforce=privileged \
pod-security.kubernetes.io/audit=privileged \
pod-security.kubernetes.io/warn=privilegedCheck that all Longhorn pods are running:
kubectl get pods -n longhorn-systemVerify storage classes are created:
kubectl get storageclassExpected output:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
longhorn (default) driver.longhorn.io Delete Immediate true 5m
longhorn-static driver.longhorn.io Delete Immediate true 5m
Access the Longhorn web interface at:
http://<any-node-ip>:30080
For detailed setup instructions, troubleshooting, and disk management, see LONGHORN_TALOS_SETUP.md.
After setting up the cluster, you may find the following steps helpful.
To connect to your Talos Kubernetes cluster using Terraform outputs, configure your local environment as follows:
-
Save the
kubeconfigandtalosconfigoutputs to files on your local machine:terraform output -raw kubeconfig > ~/.kube/config terraform output -raw talosconfig > ~/.talos/config
-
Set appropriate file permissions to avoid security issues:
chmod 600 ~/.kube/config ~/.talos/config
To interact with the Kubernetes cluster, use kubectl. For example, to list the nodes:
kubectl get nodesSample output:
NAME STATUS ROLES AGE VERSION
talos-cp-01 Ready control-plane 83s v1.32.0
talos-cp-02 Ready control-plane 86s v1.32.0
talos-cp-03 Ready control-plane 85s v1.32.0
talos-worker-01 Ready <none> 88s v1.32.0
talos-worker-02 Ready <none> 86s v1.32.0
talos-worker-03 Ready <none> 90s v1.32.0
-
View the Dashboard:
talosctl dashboard -n talos-cp-01
-
Check Cluster Health:
talosctl -n talos-cp-01 health
Sample output:
discovered nodes: ["10.0.0.73" "10.0.0.74" "10.0.0.75" "10.0.0.70" "10.0.0.71" "10.0.0.72"] waiting for etcd to be healthy: OK waiting for all k8s nodes to report ready: OK waiting for all control plane components to be ready: OK ... -
Verify System Extensions:
talosctl get extensions --nodes 10.0.0.70
Sample output:
NODE NAMESPACE TYPE ID VERSION NAME VERSION 10.0.0.70 runtime ExtensionStatus 0 1 iscsi-tools v0.2.0 10.0.0.70 runtime ExtensionStatus 1 1 util-linux-tools 2.41.1 10.0.0.70 runtime ExtensionStatus 2 1 qemu-guest-agent 10.0.2 10.0.0.70 runtime ExtensionStatus 3 1 schematic e187c9b90f773cd8c84e5a3265c5554ee787b2fe67b508d9f955e90e7ae8c96c
If you need to start over, you can taint resources and reapply the Terraform configuration:
terraform state list | xargs -n1 terraform taint
terraform applyAdjust paths and configurations as needed for your environment.
