|
| 1 | +# SRS-040 ClickHouse Kafka Engine |
| 2 | +# Software Requirements Specification |
| 3 | + |
| 4 | +## Table of Contents |
| 5 | + |
| 6 | + |
| 7 | +## Revision History |
| 8 | + |
| 9 | +This document is stored in an electronic form using [Git] source control management software |
| 10 | +hosted in a [GitHub Repository]. |
| 11 | +All the updates are tracked using the [Revision History]. |
| 12 | + |
| 13 | +## Introduction |
| 14 | + |
| 15 | +This software requirements specification (SRS) covers the requirements for the [Kafka Engine] in [ClickHouse]. The Kafka engine in ClickHouse allows real-time streaming and ingestion of data from Kafka topics into ClickHouse tables for analytics and large-scale processing. This engine works with [Apache Kafka], which is a distributed event streaming platform designed for high-throughput, fault-tolerant data processing. |
| 16 | + |
| 17 | +Kafka lets you: |
| 18 | +- Publish or subscribe to data flows. |
| 19 | +- Organize fault-tolerant storage. |
| 20 | +- Process streams as they become available. |
| 21 | + |
| 22 | +### Syntax |
| 23 | +#### Creating a Table |
| 24 | +```sql |
| 25 | +CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] |
| 26 | +( |
| 27 | + name1 [type1] [ALIAS expr1], |
| 28 | + name2 [type2] [ALIAS expr2], |
| 29 | + ... |
| 30 | +) ENGINE = Kafka() |
| 31 | +SETTINGS |
| 32 | + kafka_broker_list = 'host:port', |
| 33 | + kafka_topic_list = 'topic1,topic2,...', |
| 34 | + kafka_group_name = 'group_name', |
| 35 | + kafka_format = 'data_format'[,] |
| 36 | + [kafka_schema = '',] |
| 37 | + [kafka_num_consumers = N,] |
| 38 | + [kafka_max_block_size = 0,] |
| 39 | + [kafka_skip_broken_messages = N,] |
| 40 | + [kafka_commit_every_batch = 0,] |
| 41 | + [kafka_client_id = '',] |
| 42 | + [kafka_poll_timeout_ms = 0,] |
| 43 | + [kafka_poll_max_batch_size = 0,] |
| 44 | + [kafka_flush_interval_ms = 0,] |
| 45 | + [kafka_thread_per_consumer = 0,] |
| 46 | + [kafka_handle_error_mode = 'default',] |
| 47 | + [kafka_commit_on_select = false,] |
| 48 | + [kafka_max_rows_per_message = 1]; |
| 49 | +``` |
| 50 | +**Required parameters:** |
| 51 | + |
| 52 | +- kafka_broker_list — A comma-separated list of brokers (for example, localhost:9092). |
| 53 | +- kafka_topic_list — A list of Kafka topics. |
| 54 | +- kafka_group_name — A group of Kafka consumers. |
| 55 | +- kafka_format — Message format (for example JSONEachRow). |
| 56 | + |
| 57 | +## Requirements |
| 58 | + |
| 59 | +### RQ.SRS-040.ClickHouse.KafkaEngine.CreateTable |
| 60 | +version: 1.0 |
| 61 | + |
| 62 | +[ClickHouse] SHALL support creating tables with the Kafka engine. |
| 63 | + |
| 64 | +### RQ.SRS-040.ClickHouse.KafkaEngine.DefaultValues |
| 65 | +version: 1.0 |
| 66 | + |
| 67 | +[ClickHouse] SHALL not support columns with default values directly in Kafka tables. |
| 68 | + |
| 69 | + |
| 70 | +### RQ.SRS-040.ClickHouse.KafkaEngine.MultipleTopics |
| 71 | +version: 1.0 |
| 72 | + |
| 73 | +[ClickHouse] SHALL support consuming data from multiple Kafka topics simultaneously. |
| 74 | + |
| 75 | + |
| 76 | +### RQ.SRS-040.ClickHouse.KafkaEngine.Data Ingestion |
| 77 | +version: 1.0 |
| 78 | + |
| 79 | +[ClickHouse] SHALL reliably consume messages from specified Kafka topics and insert them into corresponding tables without data loss. |
| 80 | + |
| 81 | +### RQ.SRS-040.ClickHouse.KafkaEngine.MaterializedViews |
| 82 | +Version: 1.0 |
| 83 | + |
| 84 | +[ClickHouse] SHALL support materialized views to automatically insert data from tables with Kafka engine into MergeTree tables. |
| 85 | + |
| 86 | +### RQ.SRS-040.ClickHouse.KafkaEngine.OffsetManagement |
| 87 | +Version: 1.0 |
| 88 | + |
| 89 | +[ClickHouse] SHALL support both automatic and manual offset management for consuming messages from Kafka topics. |
| 90 | + |
| 91 | + |
| 92 | +### RQ.SRS-040.ClickHouse.KafkaEngine.BatchConsumption |
| 93 | +Version: 1.0 |
| 94 | + |
| 95 | +[ClickHouse] SHALL support configurable batch consumption of messages from Kafka topics using the kafka_max_block_size parameter. |
| 96 | + |
| 97 | +### RQ.SRS-040.ClickHouse.KafkaEngine.SSL_TLS |
| 98 | +Version: 1.0 |
| 99 | + |
| 100 | +[ClickHouse] SHALL support SSL/TLS encryption for secure communication between Kafka brokers and ClickHouse. |
| 101 | + |
| 102 | + |
| 103 | +### RQ.SRS-040.ClickHouse.KafkaEngine.DataFormatSupport |
| 104 | +Version: 1.0 |
| 105 | + |
| 106 | +[ClickHouse] SHALL support various message formats (e.g., JSONEachRow, Avro, Protobuf) for consuming Kafka messages. |
| 107 | + |
| 108 | +### RQ.SRS-040.ClickHouse.KafkaEngine.SystemTable.Kafka_Consumers |
| 109 | +Version: 1.0 |
| 110 | + |
| 111 | +[ClickHouse] SHALL reflect information about Kafka consumers in the kafka_consumers system table. |
| 112 | + |
| 113 | +### RQ.SRS-040.ClickHouse.KafkaEngine.LargeMessageSupport |
| 114 | +Version: 1.0 |
| 115 | + |
| 116 | +[ClickHouse] SHALL support consumption of large Kafka messages by allowing configuration of the maximum message size through settings such as `kafka_max_partition_fetch_bytes` and `kafka_max_message_size`. |
| 117 | + |
| 118 | +## References |
| 119 | +* [Kafka Engine] |
| 120 | +* [Apache Kafka] |
| 121 | +* [ClickHouse] |
| 122 | + |
| 123 | + |
| 124 | +[Git]: https://git-scm.com/ |
| 125 | +[GitHub Repository]: https://github.com/Altinity/clickhouse-regression/blob/main/kafka/requirements/requirements.md |
| 126 | +[Revision History]: https://github.com/Altinity/clickhouse-regression/commits/main/kafka/requirements/requirements.md |
| 127 | +[ClickHouse]: https://clickhouse.com |
| 128 | +[Kafka Engine]: https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka |
| 129 | +[Apache Kafka]: http://kafka.apache.org/ |
0 commit comments