-
Notifications
You must be signed in to change notification settings - Fork 0
Kafka Partitions and Groups π§π½ββοΈ
Table Of Contents
- Partitions
- Creating Topics with Partitions
- Publishing Messages To Topics with Keys
- Consumer Groups
- Consumer Offset Management
- Consuming Partitioned Data
Each topic in Kafka can contain multiple partitions. A topic can have 1-n partitions. The number of the partitions are set up during the topic creation. The maximum number of partitions per cluster and per topic varies by the specific version of Kafka.
Partitions allow Kafka to scale by parallelizing ingestion, storage and consumption of messages. It provides horizontal scalability. However, creating too many partitions may result in increase memory usage and file handles.
Each partition has separate physical log files which will rollover as they reach reconfigured sizes. A given massage in Kafka is stored in only 1 partition. Each partition is assigned a broker process, known as its leader broker. In order to write to a specific partition, the message needs to be sent to its corresponding leader. The leader takes care of updating its log file as well as replicating that partition to other copies. The leader will also send data to the subscribers of the partition.
With multiple partitions for a topic, consumers can share workloads through consumer groups. Partitions ca also be replicated for fault tolerance purposes.
Each published message gets stored in only 1 partition. If the partition is replicated, each replicated copy will also get an instance of this message.
Message ordering is guaranteed only within a partition.
The partition for a message is determined by its message key. Kafka uses a hashing function to allocate a partition based on the message key. Messages with the same key will always end up in the same partition (same message key = same partition).
- Create a Topic with 03 partitions
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$
> ./kafka-topics.sh \
> --zookeeper zookeeper:2181 \
> --create \
> --topic kafka.learning.orders \
> --partitions 3 \
> --replication-factor 1
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic kafka.learning.orders.
- Check topic partitioning
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$
> ./kafka-topics.sh \
> --zookeeper zookeeper:2181 \
> --topic kafka.learning.orders \
> --describe
Topic: kafka.learning.orders PartitionCount: 3 ReplicationFactor: 1 Configs:
Topic: kafka.learning.orders Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001
Topic: kafka.learning.orders Partition: 1 Leader: 1001 Replicas: 1001 Isr: 1001
Topic: kafka.learning.orders Partition: 2 Leader: 1001 Replicas: 1001 Isr: 1001
- Publishing with Keys
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$
> ./kafka-console-producer.sh \
> --bootstrap-server localhost:29092 \
> --property "parse.key=true" \
> --property "key.separator=:" \
> --topic kafka.learning.orders
>1001:"Mouse,23.05"
>1002:"Keyboard,10.00"
- Navigate to data directory
I have no name!@f871c6f4b068:/bitnami/kafka/data$ ls -la
total 40
drwxrwxr-x 1 root root 4096 Jun 29 17:26 .
drwxrwxr-x 1 root root 4096 Apr 20 2021 ..
-rw-r--r-- 1 1001 root 0 Jun 29 16:40 .lock
-rw-r--r-- 1 1001 root 0 Jun 29 16:40 cleaner-offset-checkpoint
drwxr-xr-x 2 1001 root 4096 Jun 29 17:16 kafka.learning.orders-0
drwxr-xr-x 2 1001 root 4096 Jun 29 17:16 kafka.learning.orders-1
drwxr-xr-x 2 1001 root 4096 Jun 29 17:16 kafka.learning.orders-2
-rw-r--r-- 1 1001 root 4 Jun 29 17:26 log-start-offset-checkpoint
-rw-r--r-- 1 1001 root 91 Jun 29 16:41 meta.properties
-rw-r--r-- 1 1001 root 82 Jun 29 17:26 recovery-point-offset-checkpoint
-rw-r--r-- 1 1001 root 82 Jun 29 17:26 replication-offset-checkpoint
- Kafka.learning.order* directories
I have no name!@f871c6f4b068:/bitnami/kafka/data$ ls kafka.learning.orders*
kafka.learning.orders-0:
00000000000000000000.index 00000000000000000000.log 00000000000000000000.timeindex leader-epoch-checkpoint
kafka.learning.orders-1:
00000000000000000000.index 00000000000000000000.log 00000000000000000000.timeindex leader-epoch-checkpoint
kafka.learning.orders-2:
00000000000000000000.index 00000000000000000000.log 00000000000000000000.timeindex leader-epoch-checkpoint
- Inspect Partitions
I have no name!@f871c6f4b068:/bitnami/kafka/data$ cat kafka.learning.orders-0/00000000000000000000.log
Iβ»iοΏ½οΏ½οΏ½βΊοΏ½οΏ½~4οΏ½βΊοΏ½οΏ½~4οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½βΊ1001β"Mouse,23.05"βΊLβ»οΏ½οΏ½-jβΊοΏ½οΏ½~οΏ½οΏ½βΊοΏ½οΏ½~οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½βΊ1002 "Keyboard,10.00"
# Empty for now
I have no name!@f871c6f4b068:/bitnami/kafka/data$ cat kafka.learning.orders-1/00000000000000000000.log
# Empty for now
I have no name!@f871c6f4b068:/bitnami/kafka/data$ cat kafka.learning.orders-2/00000000000000000000.log
A consumer group is a group of consumers that share a topic workload. A topic may be generating thousands of messages in a short amount of time. It may not be possible for one single consumer process to keep up with processing these messages. For scalability, multiple processes can be started and the messages ca be distributed among them for load balancing. A consumer group is a logical group of consumers that Kafka uses for such load distribution.
Each message will be sent to only one consumer within a consumer group. That consumer is then responsible for processing the message and acknowledge back to Kafka.
Consumers split workload among themselves using partitions. Kafka keep tracks of the active number of consumers for a given topic, it then distributes the messages between these consumers. Kafka only considers the number of partitions for distribution, not the number of messages expected in each partition. It is expected that the Number of partitions are equal or higher than the number of consumers in a group.
We can create multiple consumer groups, each with a different set of consumers. Each group will get a full copy of all the messages, but each message will be sent only to one consumer within each consumer group.
When new consumers come up or existing consumers go down, Kafka takes care of re-balancing the load by reassigning partitions among live consumers.
Source [1]
Consumer Offset us a number to track message consumption by each consumer and partition. As each message is received by Kafka, it allocates a message ID to the message. Kafka then maintains the message ID offset on a by consumer and by partition basis to track consumption.
Kafka brokers keep track of both what is sent to the consumer and what is acknowledged by the consumer by using two offset values:
- Current offset : last message sent to a given consumer
- Committed offset : last message acknowledged by consumer
By default, Kafka consumers auto acknowledge on receipt, but this can be changed by the consumer.
When Kafka brokers don't receive acknowledgement within a set timeout, they will resend the message to the consumer (in case failure/timeout). This ensures at least one delivery of each message to a consumer group.
A message can be delivered multiple times if acknowledgement does not happen within a timeout.
When a consumer group starts up, it has the option of requesting messages either from the start, only the latest or from given offset (start/new/from-offset).
Source [1]
- Consume messages using a consumer group
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$
> ./kafka-console-consumer.sh \
> --bootstrap-server localhost:29092 \
> --topic kafka.learning.orders \
> --group test-consumer-group \
> --property print.key=true \
> --property key.separator=" = " \
> --from-beginning
1001 = "Mouse,23.05"
1002 = "Keyboard,10.00"
^CProcessed a total of 2 messages
- Check current status of offsets
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$
> ./kafka-consumer-groups.sh \
> --bootstrap-server localhost:29092 \
> --describe \
> --all-groups
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test-consumer-group kafka.learning.orders 0 2 2 0 consumer-test-consumer-group-1-cd6d8963-d850-471f-bfd1-2798599f4369 /172.18.0.3 consumer-test-consumer-group-1
test-consumer-group kafka.learning.orders 1 0 0 0 consumer-test-consumer-group-1-cd6d8963-d850-471f-bfd1-2798599f4369 /172.18.0.3 consumer-test-consumer-group-1
test-consumer-group kafka.learning.orders 2 0 0 0 consumer-test-consumer-group-1-cd6d8963-d850-471f-bfd1-2798599f4369 /172.18.0.3 consumer-test-consumer-group-1
"A time for everything, and to everything its place
Return what has been moved through time and space."
[Charmed]