We provide a sample Terraform that you can use as a reference to set up your Kubernetes cluster using Azure Kubernetes Service (AKS). This Terraform gives you recommended practices for the cluster to help ensure your deployment of PubSub+ Cloud is successful.
You can review the architecture and understand how to deploy using the Terraform. For information about the architecture, see:
The information on this page pertains to the Terraform. For information about the requirements for the AKS cluster, see the documentation website.
The sections describes the architecture reference Terraform project for deploying and Azure Kubernetes Service (AKS) cluster. It includes Kubernetes components and configuration that:
- are required (or highly recommended) to operate successfully with Solace Cloud
- are recommended but not required to successfully deploy PubSub+ Cloud
- are available to produce a working cluster but we are not opinionated on what to use (an option or configuraton had to be selected as part of the Terraform, but does not impact the installation of PubSub+ Cloud)
The areas to review are the networking, cluster configuration, and access to and from the cluster. Below is an architectural diagram of the components of the AKS cluster that are created with this Terraform project:
By default, a VNET with a single subnet is created to host the Azure Kubernetes Service (AKS) cluster.
The Load Balancer that is created as part of the AKS cluster provides Source Network Address Translation (SNAT). This is cheaper and simpler than using NAT Gateways, although you can choose NAT Gateways if required. Using NAT Gateways requires modification of the Terraform project.
The VNET is an optional component. If the VNET that will host the cluster already exists, or will be created with other automation, you can provide its details in variables.
There are currently two options for networking in AKS: Azure CNI and Kubenet. This Terraform project uses Kubenet, which is our recommended option as it provides the most efficient use of IP addresses within the VNET's CIDR. PubSub+ Cloud also supports Azure CNI and you can modify this Terraform project to use it. The result of doing so is a larger CIDR for the VNET, as we recommend a 1:1 event broker service pod to worker node ratio.
The CIDR Calculator for PubSub+ Cloud can be used to properly size the VNET to support the number of event broker services you require. A correctly sized VNET CIDR is critical as this cannot be changed once the cluster is created.
The cluster has the following node pools:
The default (system) node pool spans all three availability zones. By default there are two worker nodes in this pool. It uses the Standard_D2ds_v5
VM size. All the standard Kubernetes services as well as PubSub+ Mission Control Agent run on these worker nodes.
The cluster has a total of 12 node pools for event broker services. Instead of spanning multiple availability zones, there are 4 sets of 3 node pools each locked to a single availability zone. These node pools are locked to a single availability zone to allow the cluster autoscaler to worker properly. We use pod anti-affinity against the node's zone label to ensure that each pod in a high-availability event broker service is in a separate availability zone.
These node pools are engineered to support a 1:1 ratio of event broker service pod to worker node. We use labels and taints on each of these node pools to ensure that only event broker service pods are scheduled on the worker nodes for each scaling tier.
The VM sizes, labels, and taints for each event broker service node pool are as follows:
Name | VM size | Labels | Taints |
---|---|---|---|
prod1k | Standard_E2ds_v5 | nodeType:messaging serviceClass:prod1k |
nodeType:messaging:NoExecute serviceClass:prod1k:NoExecute |
prod5k | Standard_E4ds_v5 | nodeType:messaging serviceClass:prod5k |
nodeType:messaging:NoExecute serviceClass:prod5k:NoExecute |
prod10k | Standard_E4bds_v5 | nodeType:messaging serviceClass:prod10k |
nodeType:messaging:NoExecute serviceClass:prod10k:NoExecute |
prod50k | Standard_E8bds_v5 | nodeType:messaging serviceClass:prod50k |
nodeType:messaging:NoExecute serviceClass:prod50k:NoExecute |
prod100k | Standard_E8bds_v5 | nodeType:messaging serviceClass:prod100k |
nodeType:messaging:NoExecute serviceClass:prod100k:NoExecute |
monitoring | Standard_D2ds_v5 | nodeType:monitoring | nodeType:monitoring:NoExecute |
There are two options for cluster access:
- A bastion host (enabled by default, but you can choose excluded it) which has a public IP and is accessible via SSH from provided CIDRs
- Optionally, the cluster's API can be made public and restricted to provided CIDRs (by default the API is private)
There are also a few options for authentication:
- Azure RBAC is enabled by default. The
kubernetes_cluster_admin_users
orkubernetes_cluster_admin_groups
variables should be set with the user or group names that will be given admin access to the cluster. - Optionally, local authentication can be enabled by setting
local_account_disabled
to false.
The following section is an overview of the steps to use this Terraform. Before you you begin, review the necessary prerequistites. Here's an overview of the steps:
To use this Terraform module, the following is required:
- Terraform 1.3 or above (we recommend tfenv for Terraform version management)
- Azure Command Line Interface
- yq
- kubectl
- helm
- Navigate to the
terraform/
directory and create aterraform.tfvars
file with the required variables.
- The VNET's CIDR must be sized appropriately for the number of event broker services that will be created, this can be done using the CIDR Calculator for PubSub+ Cloud using the 'AKS Kubenet' sheet.
- The
kubernetes_version
variable should be set to the latest Kubernetes version that is supported by PubSub+ Cloud. - The
bastion_ssh_authorized_networks
variable must be set with the CIDR(s) of the networks where the bastion host will be accessed from. - The
bastion_ssh_public_key
variable must be set with the public key of the key pair that will be used to access the bastion host. - The
worker_node_ssh_public_key
variable must be set with the public key of the key pair that will be used to access the worker node hosts. - The
kubernetes_cluster_admin_users
orkubernetes_cluster_admin_groups
variable must be set to allow for specific users or groups access to the cluster.
See the Terraform README.md for a full list of the required and optional variables available.
For example:
region = "eastus2"
cluster_name = "solace-eastus2"
kubernetes_version = "1.29"
vnet_cidr = "10.1.1.0/24"
bastion_ssh_authorized_networks = ["192.168.1.1/32"]
bastion_ssh_public_key = "ssh-rsa abc123..."
worker_node_ssh_public_key = "ssh-rsa abc234..."
kubernetes_cluster_admin_users = ["user@..."]
- Apply the Terraform using the following commands:
terraform init
terraform apply
- After you create the cluster, set up access to the cluster:
-
If the bastion host was created, use the
connect.sh
script to open a tunnel and set up your environment to access the cluster:source ./connect.sh --private-key <ssh private key path> # this creates a proxy via the bastion and sets up a KUBECONFIG file with the appropriate proxy configuration
-
If the kubernetes API was configured to be publicly accessible, all that is required is a
kubeconfig
file:export KUBECONFIG=`mktemp` # this can be excluded if you want your standard ~/.kube/config.yaml file updated az aks get-credentials --resource-group <resource-group-name> --name <cluster-name> -f ${KUBECONFIG}
Create a Storage Class with these recommended settings:
kubectl apply -f kubernetes/storage-class.yaml
There are no breaking changes when migrating to this version.
The v2 version of this Terraform project has moved the use of the messaging node pool modules from the cluster module to the main project. Due to technical reasons, the default 'system' node pool cannot be moved into the main project as it's tied to the cluster resource.