DataDog · sangeetashivaji · Mar 23, 2026 · chatgpt-codex-connector · Mar 23, 2026
@@ -0,0 +1,32 @@
+{
+  "version": 2,
+  "created_at": "2026-03-23",
+  "last_updated_at": "2026-03-23",
+  "title": "ClickHouse cannot connect",
+  "tags": [
+    "integration:clickhouse"
+  ],
+  "description": "The Datadog Agent is unable to connect to the monitored ClickHouse instance. This may indicate that ClickHouse is down, unreachable, or that the Agent's credentials are misconfigured.",
+  "definition": {
+    "message": "The Datadog Agent cannot connect to ClickHouse on {{host.name}}. Verify that the ClickHouse server is running and that the Agent configuration is correct.",
+    "name": "[ClickHouse] Cannot connect to {{host.name}}",
+    "options": {
+      "new_host_delay": 300,
+      "no_data_timeframe": 2,
+      "notify_audit": false,
+      "notify_no_data": false,
+      "renotify_interval": 0,
+      "thresholds": {
+        "critical": 1,
+        "ok": 1,
+        "warning": 1
+      },
+      "timeout_h": 0
+    },
+    "query": "\"clickhouse.can_connect\".over(\"*\").by(\"*\").last(2).count_by_status()",
+    "tags": [
+      "integration:clickhouse"
+    ],
+    "type": "service check"
+  }
+}
@@ -0,0 +1,36 @@
+{
+  "version": 2,
+  "created_at": "2026-03-23",
+  "last_updated_at": "2026-03-23",
+  "title": "ClickHouse active query count is high",
+  "tags": [
+    "integration:clickhouse"
+  ],
+  "description": "A high number of simultaneously executing queries can saturate ClickHouse thread pools and degrade performance for all users. This monitor tracks the number of concurrently active queries.",
+  "definition": {
+    "message": "{{#is_alert}}\n\n## What's happening?\nClickHouse on {{host.name}} has a high number of concurrently active queries over the last 5 minutes. This may indicate query pile-up, long-running queries, or insufficient resources.\n\n## How to investigate\nCheck `system.processes` in ClickHouse for currently executing queries and identify any long-running or stuck queries.\n\n{{/is_alert}}",
+    "name": "[ClickHouse] High number of active queries on {{host.name}}",
+    "options": {
+      "escalation_message": "",
+      "include_tags": true,
+      "locked": false,
+      "new_host_delay": 300,
+      "no_data_timeframe": null,
+      "notify_audit": false,
+      "notify_no_data": false,
+      "renotify_interval": "0",
+      "require_full_window": true,
+      "thresholds": {
+        "critical": 200,
+        "warning": 100
+      },
+      "timeout_h": 0
+    },
+    "priority": null,
+    "query": "avg(last_5m):avg:clickhouse.query.active{*} > 200",
+    "tags": [
+      "integration:clickhouse"
+    ],
+    "type": "query alert"
+  }
+}
@@ -0,0 +1,36 @@
+{
+  "version": 2,
+  "created_at": "2026-03-23",
+  "last_updated_at": "2026-03-23",
+  "title": "ClickHouse query failure rate is high",
+  "tags": [
+    "integration:clickhouse"
+  ],
+  "description": "A high rate of failed queries in ClickHouse can indicate problematic queries, resource exhaustion, or misconfigured query limits. This monitor tracks the per-second rate of failed queries to catch degradation early.",
+  "definition": {
+    "message": "{{#is_alert}}\n\n## What's happening?\nClickHouse on {{host.name}} has a high query failure rate over the last 5 minutes.\n\n## How to investigate\nCheck the ClickHouse system log (`system.query_log`) for error details and identify the failing queries.\n\n{{/is_alert}}",
+    "name": "[ClickHouse] High query failure rate on {{host.name}}",
+    "options": {
+      "escalation_message": "",
+      "include_tags": true,
+      "locked": false,
+      "new_host_delay": 300,
+      "no_data_timeframe": null,
+      "notify_audit": false,
+      "notify_no_data": false,
+      "renotify_interval": "0",
+      "require_full_window": true,
+      "thresholds": {
+        "critical": 5,
+        "warning": 1
+      },
+      "timeout_h": 0
+    },
+    "priority": null,
+    "query": "avg(last_5m):avg:clickhouse.query.failed.count{*}.as_rate() > 5",
+    "tags": [
+      "integration:clickhouse"
+    ],
+    "type": "query alert"
+  }
+}
@@ -0,0 +1,36 @@
+{
+  "version": 2,
+  "created_at": "2026-03-23",
+  "last_updated_at": "2026-03-23",
+  "title": "ClickHouse thread CPU scheduling wait is high",
+  "tags": [
+    "integration:clickhouse"
+  ],
+  "description": "CPU scheduling wait measures the percentage of time a ClickHouse thread was ready to execute but waiting to be scheduled by the OS. High values indicate CPU contention — the server has more runnable threads than available CPU cores — which causes query latency to increase even when queries are not I/O bound.",
+  "definition": {
+    "message": "{{#is_alert}}\n\n## What's happening?\nClickHouse threads on {{host.name}} are spending a high percentage of time waiting for CPU scheduling. This indicates CPU saturation and will cause query latency to degrade.\n\n## How to investigate\nCheck overall host CPU utilization. Review concurrent query load via `system.processes`. Consider scaling up CPU resources or reducing query concurrency.\n\n{{/is_alert}}\n\n{{#is_warning}}\n\nClickHouse thread CPU wait on {{host.name}} is elevated. Monitor for further increase.\n\n{{/is_warning}}",
+    "name": "[ClickHouse] High thread CPU scheduling wait on {{host.name}}",
+    "options": {
+      "escalation_message": "",
+      "include_tags": true,
+      "locked": false,
+      "new_host_delay": 300,
+      "no_data_timeframe": null,
+      "notify_audit": false,
+      "notify_no_data": false,
+      "renotify_interval": "0",
+      "require_full_window": true,
+      "thresholds": {
+        "critical": 80,
+        "warning": 50
+      },
+      "timeout_h": 0
+    },
+    "priority": null,
+    "query": "avg(last_5m):avg:clickhouse.thread.cpu.wait{*} > 80",
+    "tags": [
+      "integration:clickhouse"
+    ],
+    "type": "query alert"
+  }
+}
@@ -0,0 +1,36 @@
+{
+  "version": 2,
+  "created_at": "2026-03-23",
+  "last_updated_at": "2026-03-23",
+  "title": "ClickHouse background merge pool is saturated",
+  "tags": [
+    "integration:clickhouse"
+  ],
+  "description": "ClickHouse uses background merge operations to combine data parts in MergeTree tables. When the merge pool is saturated, new merges cannot be scheduled, leading to an accumulation of small parts that degrades query performance and increases storage overhead. This monitor tracks the number of active background merge tasks.",
+  "definition": {
+    "message": "{{#is_alert}}\n\n## What's happening?\nThe ClickHouse background merge pool on {{host.name}} has a high number of active merge tasks. This may indicate write pressure exceeding the merge throughput, or merges being blocked by long-running operations.\n\n## How to investigate\nCheck `system.merges` for currently running merges. Consider reducing insert frequency, increasing `background_pool_size`, or investigating mutations blocking merges.\n\n{{/is_alert}}\n\n{{#is_warning}}\n\nThe ClickHouse background merge pool on {{host.name}} is becoming saturated. Monitor for further increase.\n\n{{/is_warning}}",
+    "name": "[ClickHouse] Background merge pool is saturated on {{host.name}}",
+    "options": {
+      "escalation_message": "",
+      "include_tags": true,
+      "locked": false,
+      "new_host_delay": 300,
+      "no_data_timeframe": null,
+      "notify_audit": false,
+      "notify_no_data": false,
+      "renotify_interval": "0",
+      "require_full_window": true,
+      "thresholds": {
+        "critical": 14,
+        "warning": 10
+      },
+      "timeout_h": 0
+    },
+    "priority": null,
+    "query": "avg(last_5m):avg:clickhouse.background_pool.merges.task.active{*} > 14",
+    "tags": [
+      "integration:clickhouse"
+    ],
+    "type": "query alert"
+  }
+}
@@ -0,0 +1,36 @@
+{
+  "version": 2,
+  "created_at": "2026-03-23",
+  "last_updated_at": "2026-03-23",
+  "title": "ClickHouse replica delay is high",
+  "tags": [
+    "integration:clickhouse"
+  ],
+  "description": "Replica delay is the lag between when data is written to the primary shard and when it is replicated to replica nodes. High replica delay can lead to stale reads and indicate replication health issues. This monitor tracks the maximum absolute replica queue delay across replicated tables.",
+  "definition": {
+    "message": "{{#is_alert}}\n\n## What's happening?\nClickHouse replica delay on {{host.name}} has exceeded the critical threshold over the last 15 minutes. Replicas may be serving stale data.\n\n## How to investigate\nCheck `system.replicas` for tables with high `absolute_delay`. Look for network issues between replicas, or high write load on the primary shard.\n\n{{/is_alert}}\n\n{{#is_warning}}\n\nClickHouse replica delay on {{host.name}} is elevated. Monitor for further increase.\n\n{{/is_warning}}",
+    "name": "[ClickHouse] Replica delay is high on {{host.name}}",
+    "options": {
+      "escalation_message": "",
+      "include_tags": true,
+      "locked": false,
+      "new_host_delay": 300,
+      "no_data_timeframe": null,
+      "notify_audit": false,
+      "notify_no_data": false,
+      "renotify_interval": "0",
+      "require_full_window": true,
+      "thresholds": {
+        "critical": 300000,
+        "warning": 60000
+      },
+      "timeout_h": 0
+    },
+    "priority": null,
+    "query": "avg(last_15m):avg:clickhouse.replica.delay.absolute{*} > 300000",
+    "tags": [
+      "integration:clickhouse"
+    ],
+    "type": "query alert"
+  }
+}
@@ -50,6 +50,14 @@
     },
     "dashboards": {
       "ClickHouse Overview": "assets/dashboards/overview.json"
+    },
+    "monitors": {
+      "ClickHouse cannot connect": "assets/monitors/clickhouse_can_connect.json",
+      "ClickHouse query failure rate is high": "assets/monitors/clickhouse_high_query_failure_rate.json",
+      "ClickHouse active query count is high": "assets/monitors/clickhouse_high_active_queries.json",
+      "ClickHouse replica delay is high": "assets/monitors/clickhouse_replica_delay.json",
+      "ClickHouse background merge pool is saturated": "assets/monitors/clickhouse_merge_pool_saturation.json",
+      "ClickHouse thread CPU scheduling wait is high": "assets/monitors/clickhouse_high_thread_cpu_wait.json"
     }
   }
 }