Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement mTLS resources and configuration for Target Allocator server #284

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

musa-asad
Copy link
Contributor

@musa-asad musa-asad commented Jan 20, 2025

Description of the issue

The Target Allocator server already used TLS to encrypt data, but it did not enforce mutual TLS (mTLS), meaning only the client validates the server’s certificate while the server itself wouldn't need to validate the client’s. Implementing mTLS would enhance security to only allow the CloudWatch Agent client to access the Target Allocator server.

Description of changes

Note

In mTLS, both parties hold a certificate containing their public key (signed by a Certificate Authority) and a secure, private key. During the TLS handshake, each side exchanges certificates to verify identities using the CA's public key. The client generates a random pre‐master secret, encrypts it with the server’s public key, and sends it to the server. The server then decrypts it with its private key. Both sides use this pre‐master secret—along with additional handshake information—to derive a master secret via a key derivation function. This master secret is then used to generate the actual symmetric encryption keys for the session.

ASCII Visualization

Kubernetes Secrets & Volumes
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ "amazon-cloudwatch-observability-agent-outbound-cert"  ->  Mounted into CloudWatch Agent Pod at /etc/...           │
│ "ta-secret"                                            ->  Mounted into Target Allocator Pod at /etc/...           │
│ Both contain:                                                                                                      │
│    • .crt  (certificate)  • .key (private key)  • ca.crt (certificate authority)                                   │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


[ CloudWatch Agent Client Pod ]   <========== mTLS Handshake ==========>   [ Target Allocator Server Pod ]
┌────────────────────────────────────────┐                                  ┌────────────────────────────────────────┐
│ Volume mount:                          │                                  │ Volume mount:                          │
│   /etc/amazon-cloudwatch-observability-agent-outbound-cert/               │   /etc/amazon-cloudwatch-target-allocator-cert/     
│     ├─ client.crt   (client cert)      │                                  │     ├─ server.crt  (server cert)       │
│     ├─ client.key   (client key)       │                                  │     ├─ server.key  (server key)        │
│     └─ ca.crt       (CA for server)    │                                  │     └─ ca.crt       (CA for client)    │
├────────────────────────────────────────┤                                  ├────────────────────────────────────────┤
│ 1) Agent uses client.crt/key for       │                                  │ 1) CertAndCAWatcher monitors changes   │
│    outgoing secure connections.        │                                  │    to server cert, key, and CA file.   │
│ 2) Validates server’s certificate      │                                  │ 2) On updates, reloads cert/CA pool    │
│    with ca.crt during handshake.       │                                  │    so that new credentials are used    │
└────────────────────────────────────────┘                                  └────────────────────────────────────────┘

               ┌───────────────────────────────────────────────────────────────────────────────────────┐
               │                mTLS Handshake Steps (Simplified)                                      │
               │  1. Agent sends ClientHello.                                                          │
               │  2. Server responds with ServerHello + server.crt.                                    │
               │  3. Server requests client.crt (mutual TLS).                                          │
               │  4. Agent verifies server.crt using its ca.crt; Server verifies client.crt with its   │
               │    own ca.crt.                                                                        │
               │  5. Both sides confirm certificates, complete TLS handshake, and establish a secure   │
               │    channel with mutual trust.                                                         │
               └───────────────────────────────────────────────────────────────────────────────────────┘
  • Implemented CertAndCAWatcher to monitor and update file paths for the server certificate, server key, and client certificate authority (CA) and added unit test.
  • Added Kubernetes Secrets and volume mounts containing the server certificate, server key, and client CA and updated unit tests.
  • Enforced mTLS in NewTLSConfig function and added unit test.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Resources

Volume Mounts
Screenshot 2025-01-21 at 12 38 29 AM

Volumes
Screenshot 2025-01-21 at 12 38 45 AM

mTLS

  1. Created EKS cluster using custom agent (Add default TLS client cert and key paths for Prometheus input and receiver amazon-cloudwatch-agent#1510), operator, and target allocator images.
  2. Ran helm upgrade --install --debug amazon-cloudwatch-observability helm-charts/charts/amazon-cloudwatch-observability --set clusterName=<cluster_name> --set region=us-west-2 --namespace amazon-cloudwatch --create-namespace with custom helm charts (Implement mTLS resources for CloudWatch Agent client aws-observability/helm-charts#163) and edited values.yaml with a custom agent and prometheus configuration.
  3. Ran following command to create a debug pod with access to volume mounts:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
  namespace: amazon-cloudwatch
spec:
  containers:
  - name: debug-container
    image: ubuntu:latest
    command: ["/bin/bash", "-c", "while true; do sleep 30; done;"]
    volumeMounts:
    - name: agenttls
      mountPath: /etc/amazon-cloudwatch-observability-agent-cert
      readOnly: true
    - name: agentoutboundtls
      mountPath: /etc/amazon-cloudwatch-observability-agent-outbound-cert
      readOnly: true
  volumes:
  - name: agenttls
    secret:
      secretName: amazon-cloudwatch-observability-agent-cert
  - name: agentoutboundtls
    secret:
      secretName: amazon-cloudwatch-observability-agent-outbound-cert
  restartPolicy: Never
EOF
  1. Port forwarded target-allocator-service.
  2. Executed debug pod.
  3. Ran apt update && apt install -y curl openssl && cd /tmp.
  4. Ran openssl genrsa -out dummyCA.key 2048.
  5. Ran openssl req -x509 -new -nodes -key dummyCA.key -sha256 -days 365 -out dummyCA.crt -subj "/C=US/ST=Test/L=Test/O=Test/OU=Test/CN=DummyCA"
  6. Ran cd ..
  7. Ran curl -iv --cert /etc/amazon-cloudwatch-observability-agent-outbound-cert/tls.crt --key /etc/amazon-cloudwatch-observability-agent-outbound-cert/tls.key --cacert /etc/amazon-cloudwatch-observability-agent-cert/ca.crt https://cloudwatch-agent-w-prom-target-allocator-service:80/jobs

Successfully got {"kubernetes-pod-jmx":{"_link":"/jobs/kubernetes-pod-jmx/targets"},"kubernetes-pod-fluentbit-plugin":{"_link":"/jobs/kubernetes-pod-fluentbit-plugin/targets"},"kube-metrics":{"_link":"/jobs/kube-metrics/targets"},"kubernetes-pod-appmesh-envoy":{"_link":"/jobs/kubernetes-pod-appmesh-envoy/targets"},"kubernetes-service-endpoints":{"_link":"/jobs/kubernetes-service-endpoints/targets"}}

  1. Ran curl -iv --cert /etc/amazon-cloudwatch-observability-agent-outbound-cert/tls.crt --key /etc/amazon-cloudwatch-observability-agent-outbound-cert/tls.key --cacert /tmp/dummyCA.crt https://cloudwatch-agent-w-prom-target-allocator-service:80/jobs

Successfully got curl: (60) SSL certificate problem: unable to get local issuer certificate

  1. Exited and ran kubectl debug -n kube-system aws-node-xxxxx -it --image=ubuntu:latest
  2. Ran apt update && apt install -y curl jq
  3. Ran curl -k https://<NODE_IP>:8443/jobs

Successfully got SSL routines::tlsv13 alert certificate required, errno 0. Also got TLS handshake error from XXXX: remote error: tls: bad certificate in Target Allocator pod logs. Running curl -k https://<NODE_IP>:8443/jobs from another node worked on previous image, which shows mTLS is now enforced.

CertWatcher

  1. Did a helm install.
  2. Executed debug pod.
  3. Ran cat on each certificate/key to confirm value changed.
  4. Re-ran curl -iv --cert /etc/amazon-cloudwatch-observability-agent-outbound-cert/tls.crt --key /etc/amazon-cloudwatch-observability-agent-outbound-cert/tls.key --cacert /etc/amazon-cloudwatch-observability-agent-cert/ca.crt https://cloudwatch-agent-w-prom-target-allocator-service:80/jobs

Successfully got {"kubernetes-pod-jmx":{"_link":"/jobs/kubernetes-pod-jmx/targets"},"kubernetes-pod-fluentbit-plugin":{"_link":"/jobs/kubernetes-pod-fluentbit-plugin/targets"},"kube-metrics":{"_link":"/jobs/kube-metrics/targets"},"kubernetes-pod-appmesh-envoy":{"_link":"/jobs/kubernetes-pod-appmesh-envoy/targets"},"kubernetes-service-endpoints":{"_link":"/jobs/kubernetes-service-endpoints/targets"}}. Also debug logs show metrics emit successfully after cert refresh.

@musa-asad musa-asad changed the title Implement mTLS for Target Allocator server Implement mTLS resources and configuration for Target Allocator server Jan 20, 2025
@@ -3,6 +3,7 @@ FROM golang:1.22 as builder

# set goproxy=direct
ENV GOPROXY direct
ENV GOINSECURE go.opencensus.io
Copy link
Contributor Author

@musa-asad musa-asad Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporarily needs to be added since their certificate expired, which is breaking our workflow.

Suggested change
ENV GOINSECURE go.opencensus.io

@musa-asad musa-asad marked this pull request as ready for review January 21, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant