Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8sattributesprocessor to otlp pipeline with workload type detection #1524

Open
wants to merge 12 commits into
base: feature-custom-metrics-entity
Choose a base branch
from

Conversation

musa-asad
Copy link
Contributor

@musa-asad musa-asad commented Feb 3, 2025

Description of the issue

To support the Explore related feature in CloudWatch, the CloudWatch Agent sends an "Entity", which includes relevant metadata to correlate metrics or logs between resources (e.g., an EKS cluster) and services (e.g., a Java application). When the CloudWatch Agent runs in a Kubernetes cluster, we need to collect the namespace, workload name, and node name to populate the "Entity".

However, we currently only get Kubernetes metadata when Application Signals is enabled. For OTLP custom metrics, if Application Signals isn't configured, then we don't have a way to fetch Kubernetes metadata. To achieve this, we must implement the Kubernetes Attributes Processor within the CloudWatch Agent.

Additionally, the process of fetching metadata with the Kubernetes Attributes Processor depends on the agent's workload type:

  • Daemonset Mode:
    If the agent is running as a daemonset, we must configure a node filter. This prevents the agent from fetching metadata for pods on other nodes.

Hence, we must also implement workload type detection.

Description of changes

Revision 1

  • Implements k8sattributesprocessor:
    • Add translation logic for k8sattributesprocessor to extract metadata from the application pod's IP and set node filter if the agent is a DaemonSet.
    • Add k8sattributesprocessor to otlp pipeline.
    • Update sample yaml files to include k8sattributesprocessor.
  • Implement workload type detection:
    • Add getWorkloadType() in translator/util/eksdetector/eksdetector.go to query Kubernetes API with POD_NAME and K8S_NAMESPACE environmental variables and retrieve workload type from pod information.
    • Configure Workload value in IsEKSCache.
    • Reference IsEKSCache to use in DetectWorkloadType() to return workload type.
    • Add getter and setter for workload type in translator/context/context.go.
    • Set workload type in the config-translator binary.
  • Add constants for DaemonSet, Deployment, and StatefulSet.
  • Update and add unit tests for new functionality.

Revision 2

  • Add validation for SetWorkloadType().
  • Use constants for return values in getWorkloadType().
  • Change "Unknown" to "" in getWorkloadType() since it serves no functional purpose.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  1. Created an EKS cluster and deployed the Amazon CloudWatch Observability EKS add-on.
  2. Set up sample application by following https://aws-otel.github.io/docs/getting-started/adot-eks-add-on/sample-app.
  • Removed resource attributes.
  • Changed OTEL_EXPORTER_OTLP_ENDPOINT to http://cloudwatch-agent.amazon-cloudwatch:4317.
  1. Built the agent image by running make docker-build-amd64 and changed the image in the AmazonCloudWatchAgent custom resource.
  • Added debug exporter to OTLP pipeline for testing.

Kubernetes Attributes Processor

Debug Exporter Output:
K8s Metadata:

k8s.pod.name: Str(sample-app-69cdbb5f95-sqks8)
k8s.namespace.name: Str(default)
k8s.replicaset.name: Str(sample-app-69cdbb5f95)
k8s.deployment.name: Str(sample-app)
k8s.node.name: Str(ip-XXX-XX-XX-XX.us-west-2.compute.internal)

Entity Fields:

com.amazonaws.cloudwatch.entity.internal.type: Str(Service)
com.amazonaws.cloudwatch.entity.internal.service.name: Str(unknown_service:java)
com.amazonaws.cloudwatch.entity.internal.deployment.environment: Str(k8s:entity-cluster-2/default)
com.amazonaws.cloudwatch.entity.internal.platform.type: Str(K8s)
com.amazonaws.cloudwatch.entity.internal.k8s.cluster.name: Str(entity-cluster-2)
com.amazonaws.cloudwatch.entity.internal.k8s.namespace.name: Str(default)
com.amazonaws.cloudwatch.entity.internal.k8s.workload.name: Str(sample-app)
com.amazonaws.cloudwatch.entity.internal.k8s.node.name: Str(ip-XXX-XX-XX-XX.us-west-2.compute.internal)
com.amazonaws.cloudwatch.entity.internal.instance.id: Str(i-0da7f196c5fa59a25)

EMF Output:
K8s Metadata:

"k8s.deployment.name": "sample-app",
"k8s.namespace.name": "default",
"k8s.node.name": "ip-XXX-XX-XX-XX.us-west-2.compute.internal",
"k8s.pod.ip": "XXX.XX.XX.XXX",
"k8s.pod.name": "sample-app-69cdbb5f95-sqks8",
"k8s.replicaset.name": "sample-app-69cdbb5f95",

Workload Type Detection

DaemonSet:
Screenshot 2025-02-03 at 1 17 06 AM

Deployment:
Screenshot 2025-02-03 at 1 17 48 AM

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@musa-asad musa-asad changed the base branch from main to custom-metrics-entity February 3, 2025 00:05
@musa-asad musa-asad self-assigned this Feb 3, 2025
@musa-asad musa-asad changed the title Add k8sattributesprocessor to k8s otlp pipeline with workload type detection Add k8sattributesprocessor to otlp pipeline with workload type detection Feb 3, 2025
@musa-asad musa-asad requested review from lisguo and okankoAMZ February 3, 2025 00:12
@musa-asad musa-asad requested review from duhminick and removed request for okankoAMZ February 3, 2025 02:01
@musa-asad
Copy link
Contributor Author

musa-asad commented Feb 3, 2025

It looks like the sample application I used return unknown_service:java as the service name. Similar to JMX, should we use the resource processor to remove this resource attribute to be able to fall back to the K8sWorkload name?

@musa-asad musa-asad marked this pull request as ready for review February 3, 2025 06:19
@musa-asad musa-asad requested a review from a team as a code owner February 3, 2025 06:19
@musa-asad musa-asad changed the base branch from custom-metrics-entity to main February 3, 2025 06:20
@musa-asad musa-asad changed the base branch from main to custom-metrics-entity February 3, 2025 06:40
@musa-asad musa-asad changed the base branch from custom-metrics-entity to main February 3, 2025 06:41
@musa-asad musa-asad changed the base branch from main to custom-metrics-entity February 3, 2025 06:41
@musa-asad musa-asad changed the base branch from feature-custom-metrics-entity to main February 3, 2025 06:44
@musa-asad musa-asad changed the base branch from main to feature-custom-metrics-entity February 3, 2025 06:44
@musa-asad musa-asad closed this Feb 3, 2025
@musa-asad musa-asad reopened this Feb 3, 2025
@musa-asad musa-asad removed the request for review from duhminick February 3, 2025 07:00
@musa-asad musa-asad requested review from JayPolanco and duhminick and removed request for JayPolanco February 3, 2025 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants