AWS CloudWatch logs for Container Insights contain no CPU usage metrics when setting collection_interval to more than 300s #36109

oleksandr-san · 2024-10-31T11:40:23Z

Component(s)

receiver/awscontainerinsight

What happened?

Description

We've tried to increase the collection_interval parameter for the receivers.awscontainerinsight component to optimize AWS CloudWatch costs.

I've figured, that it is related to the TTL in the map used to store metric deltas: when the collection interval is more than 5 minutes, collecting deltas breaks because older deltas get removed before new deltas are applied.

Increasing the cleanInterval to 15 minutes helps.

Steps to Reproduce

Create any EKS cluster
Install OTEL to collect AWS Container Insights
Set receivers.awscontainerinsightreceiver.collection_interval to 600s
Restart the daemonset
Wait for 15-20 minutes

Expected Result

Log events in CloudWatch contain CPU usage metrics

Actual Result

Log events in CloudWatch do not contain CPU usage metrics

Collector version

0.41.1

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

extensions:
    health_check:

 receivers:
   awscontainerinsightreceiver:
     collection_interval: 600s

 processors:
   batch/metrics:
     timeout: 60s
 
   exporters:
      awsemf:
        namespace: ContainerInsights
        log_group_name: '/aws/containerinsights/{ClusterName}/performance'
        log_stream_name: '{NodeName}'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: NoDimensionRollup
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          # cluster metrics
          - dimensions: [[ClusterName]]
            metric_name_selectors:
              - cluster_node_count
              - cluster_failed_node_count

    service:
      pipelines:
        metrics:
          receivers: [awscontainerinsightreceiver]
          processors: [batch/metrics]
          exporters: [awsemf]

      extensions: [health_check]

Log output

No response

Additional context

Log event with collection_interval == 600s:

{
    "AutoScalingGroupName": "eks-agent-ng-arm64-4ac815a7-3a71-20b4-a604-aa35acfabcd4",
    "ClusterName": "cluster-with-agent",
    "InstanceId": "i-019f99ea685e48c83",
    "InstanceType": "t4g.medium",
    "Namespace": "kube-system",
    "NodeName": "ip-172-31-28-91.eu-north-1.compute.internal",
    "PodName": "aws-node",
    "Sources": [
        "cadvisor",
        "pod",
        "calculated"
    ],
    "Timestamp": "1730302312567",
    "Type": "Container",
    "Version": "0",
    "container_memory_cache": 106377216,
    "container_memory_failcnt": 0,
    "container_memory_mapped_file": 811008,
    "container_memory_max_usage": 160075776,
    "container_memory_rss": 28655616,
    "container_memory_swap": 0,
    "container_memory_usage": 136433664,
    "container_memory_utilization": 1.1755803143695827,
    "container_memory_working_set": 47341568,
    "container_status": "Running",
    "kubernetes": {
        "container_name": "aws-node",
        "containerd": {
            "container_id": "aabb7c4bea02cfe72371bb5a36bbcd23eff478078c6e920b77e1e9e0ade591b9"
        },
        "host": "ip-172-31-28-91.eu-north-1.compute.internal",
        "labels": {
            "app.kubernetes.io/instance": "aws-vpc-cni",
            "app.kubernetes.io/name": "aws-node",
            "controller-revision-hash": "588469c5c6",
            "k8s-app": "aws-node",
            "pod-template-generation": "2"
        },
        "namespace_name": "kube-system",
        "pod_id": "c3476737-e9d4-44cb-a20f-dcb812ac9091",
        "pod_name": "aws-node-wghkn",
        "pod_owners": [
            {
                "owner_kind": "DaemonSet",
                "owner_name": "aws-node"
            }
        ]
    },
    "number_of_container_restarts": 0
}

Log event with the default configuration:

{
    "AutoScalingGroupName": "eks-agent-ng-1ac79c42-2aa5-ff45-0c1e-b03d703c0d47",
    "ClusterName": "cluster-with-agent",
    "InstanceId": "i-0becbf3535f001cb4",
    "InstanceType": "t3.medium",
    "Namespace": "kube-system",
    "NodeName": "ip-172-31-25-41.eu-north-1.compute.internal",
    "PodName": "aws-node",
    "Sources": [
        "cadvisor",
        "pod",
        "calculated"
    ],
    "Timestamp": "1730371819323",
    "Type": "Container",
    "Version": "0",
    "container_cpu_request": 25,
    "container_cpu_usage_system": 1.3264307613654849,
    "container_cpu_usage_total": 2.9252373450029627,
    "container_cpu_usage_user": 1.393591812573864,
    "container_cpu_utilization": 0.14626186725014814,
    "container_memory_cache": 24600576,
    "container_memory_failcnt": 0,
    "container_memory_hierarchical_pgfault": 267.61999880258816,
    "container_memory_hierarchical_pgmajfault": 0,
    "container_memory_mapped_file": 270336,
    "container_memory_max_usage": 56954880,
    "container_memory_pgfault": 267.61999880258816,
    "container_memory_pgmajfault": 0,
    "container_memory_rss": 26337280,
    "container_memory_swap": 0,
    "container_memory_usage": 52269056,
    "container_memory_utilization": 1.1655047122298874,
    "container_memory_working_set": 47063040,
    "container_status": "Running",
    "kubernetes": {
        "container_name": "aws-node",
        "containerd": {
            "container_id": "b038c0f909602224fa9e1b1351379ff2dc48d0de3e96f720ed80316ada28aca2"
        },
        "host": "ip-172-31-25-41.eu-north-1.compute.internal",
        "labels": {
            "app.kubernetes.io/instance": "aws-vpc-cni",
            "app.kubernetes.io/name": "aws-node",
            "controller-revision-hash": "588469c5c6",
            "k8s-app": "aws-node",
            "pod-template-generation": "2"
        },
        "namespace_name": "kube-system",
        "pod_id": "5e453328-d24c-45d8-9451-7274248cd447",
        "pod_name": "aws-node-wt85g",
        "pod_owners": [
            {
                "owner_kind": "DaemonSet",
                "owner_name": "aws-node"
            }
        ]
    },
    "number_of_container_restarts": 0
}

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-31T11:40:42Z

Pinging code owners:

receiver/awscontainerinsight: @Aneurysm9 @pxaws

See Adding Labels via Comments if you do not have permissions to add labels yourself.

oleksandr-san added bug Something isn't working needs triage New item requiring triage labels Oct 31, 2024

github-actions bot added the receiver/awscontainerinsight label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS CloudWatch logs for Container Insights contain no CPU usage metrics when setting collection_interval to more than 300s #36109

AWS CloudWatch logs for Container Insights contain no CPU usage metrics when setting collection_interval to more than 300s #36109

oleksandr-san commented Oct 31, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024

AWS CloudWatch logs for Container Insights contain no CPU usage metrics when setting collection_interval to more than 300s #36109

AWS CloudWatch logs for Container Insights contain no CPU usage metrics when setting collection_interval to more than 300s #36109

Comments

oleksandr-san commented Oct 31, 2024 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Oct 31, 2024

oleksandr-san commented Oct 31, 2024 •

edited

Loading