Skip to content

[BUG] KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"} #17914

Open
@froque

Description

@froque

Agent Environment

$ sudo datadog-agent version 
Agent 7.54.1 - Commit: 44d1992 - Serialization version: v5.0.114 - Go version: go1.21.9

Describe what happened:

After upgrading to 7.54.0, Kafka consumer lag checks started to fail

Describe what you expected:

Expected Datadog Agent to continue to get Kafka consumer lag offsets from Kafka cluster.

Steps to reproduce the issue:

  • Upgrade to v7.54.0 or v7.54.1
  • Configure Datadog to check Kafka consumer offsets
$ sudo cat /etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml
init_config:

instances:
  - kafka_connect_str:
      - <redacted>
    security_protocol: SASL_SSL
    sasl_mechanism: PLAIN
    sasl_plain_username: <redacted>

    sasl_plain_password: <redacted>

    kafka_consumer_offsets: true
    monitor_unlisted_consumer_groups: true
  • perform a check
$ sudo datadog-agent check kafka_consumer


  Running Checks
  ==============
    
    kafka_consumer (4.3.0)
    ----------------------
      Instance ID: kafka_consumer:24b8757764ea1a30 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml
      Total Runs: 1
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 5.099s
      Last Execution Date : 2024-06-24 09:11:07 WEST / 2024-06-24 08:11:07 UTC (1719216667000)
      Last Successful Execution Date : Never
      Error: Unable to connect to the AdminClient. This is likely due to an error in the configuration.
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 34, in check
          self.client.request_metadata_update()
        File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/kafka_consumer/client.py", line 180, in request_metadata_update
          self.kafka_client.list_topics(None, timeout=self.config._request_timeout)
        File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/confluent_kafka/admin/__init__.py", line 603, in list_topics
          return super(AdminClient, self).list_topics(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/base/checks/base.py", line 1224, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/kafka_consumer/kafka_consumer.py", line 36, in check
          raise Exception(
      Exception: Unable to connect to the AdminClient. This is likely due to an error in the configuration.

  Metadata
  ========
    config.hash: kafka_consumer:24b8757764ea1a30
    config.provider: file

Additional environment details (Operating System, Cloud provider, etc):

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions