KAFKA-17367: Share coordinator impl. New merge batches algorithm. [3/N] #17149

smjn · 2024-09-10T06:44:22Z

Tightened the signature of ShareCoordinatorShard.combineStateBatches.
The combineStateBatches method takes in the current batches and the new list of batches and creates new records in case there is partial overlap between older and newer ones. It then sorts and merges the batch lists.
Added comprehensive tests for the above method.

AndrewJSchofield

I can think of some more permutations.

What if some POSB was updated with multiple POSB, such as {(111-120, 0, 1)} updated with {(111-113, 0, 2),(114-114, 2, 1), (115-119, 0, 2)}?

What if some POSB was updated with a partially overlapping POSB, such as {(111-120, 0, 1)} updated with {(111-113, 0, 2)}?

AndrewJSchofield · 2024-09-10T11:43:57Z

...-coordinator/src/test/java/org/apache/kafka/coordinator/share/ShareCoordinatorShardTest.java

+                    new PersisterOffsetsStateBatch(105, 130, (byte) 0, (short) 1)
+                )),
+                Arrays.asList(
+                    new PersisterOffsetsStateBatch(100, 110, (byte) 0, (short) 1),


This result is correct, but I would expect (100, 130, (byte) 0, (short) 1)).

AndrewJSchofield · 2024-09-10T11:44:41Z

...-coordinator/src/test/java/org/apache/kafka/coordinator/share/ShareCoordinatorShardTest.java

+            new TestAttributes(
+                "StartOffset is -1 => no batches are considered old but overlaps are still removed preferring the new set.",
+                new LinkedHashSet<>(Arrays.asList(
+                    new PersisterOffsetsStateBatch(100, 110, (byte) 0, (short) 1),  // should be removed


This comment // should be removed is incorrect.

smjn · 2024-09-10T12:22:25Z

I can think of some more permutations.

What if some POSB was updated with multiple POSB, such as {(111-120, 0, 1)} updated with {(111-113, 0, 2),(114-114, 2, 1), (115-119, 0, 2)}?

What if some POSB was updated with a partially overlapping POSB, such as {(111-120, 0, 1)} updated with {(111-113, 0, 2)}?

This is in keeping with the idea that SCR will not create new intervals (either due to partial interval state change or combining sequential intervals). This is fitting because the persister is a "dumb" component.
It simply maintains the intervals which callers send it in the order of arrival and prunes intervals which are expired due to start offset movement.
So,

in this example: {(111-120, 0, 1)} updated with {(111-113, 0, 2),(114-114, 2, 1), (115-119, 0, 2)} - the dumb persister will keep all of the intervals. If start offset moves, it will prune the ones such that lastOffset < startOffset thereby not continuously increasing in size. If we add more intelligence, we will create (119-120, 0, 1) which was never sent.
for this (111-120, 0, 1)} updated with {(111-113, 0, 2)} , it'll be the same situation - otherwise we are creating a new interval (114-120, 0, 1) which was never sent.
Both the situations are handled in the test case:
Sets have partial overlap => result list contains batches from both sets (we do not add gaps).
and New set batch is contained in cur set => result list contains batches from both sets (we do not add gaps).

The caller (ShareParition) is anyway doing the work of manipulating this information and creating the in memory state.

AndrewJSchofield · 2024-09-10T13:16:30Z

I can think of some more permutations.
What if some POSB was updated with multiple POSB, such as {(111-120, 0, 1)} updated with {(111-113, 0, 2),(114-114, 2, 1), (115-119, 0, 2)}?
What if some POSB was updated with a partially overlapping POSB, such as {(111-120, 0, 1)} updated with {(111-113, 0, 2)}?

This is in keeping with the idea that SCR will not create new intervals (either due to partial interval state change or combining sequential intervals). This is fitting because the persister is a "dumb" component. It simply maintains the intervals which callers send it in the order of arrival and prunes intervals which are expired due to start offset movement. So,

in this example: {(111-120, 0, 1)} updated with {(111-113, 0, 2),(114-114, 2, 1), (115-119, 0, 2)} - the dumb persister will keep all of the intervals. If start offset moves, it will prune the ones such that lastOffset < startOffset thereby not continuously increasing in size. If we add more intelligence, we will create (119-120, 0, 1) which was never sent.

for this (111-120, 0, 1)} updated with {(111-113, 0, 2)} , it'll be the same situation - otherwise we are creating a new interval (114-120, 0, 1) which was never sent.
Both the situations are handled in the test case:
Sets have partial overlap => result list contains batches from both sets (we do not add gaps).
and New set batch is contained in cur set => result list contains batches from both sets (we do not add gaps).

The caller (ShareParition) is anyway doing the work of manipulating this information and creating the in memory state.

This is fine with me in principle. We just need to ensure that @apoorvmittal10 and @adixitconfluent are aligned too.

AndrewJSchofield

Thanks for the PR. I'm still reviewing it but here are some initial comments.

AndrewJSchofield · 2024-09-12T07:15:01Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

-        // Any batches where the last offset is < the current start offset
-        // are now expired. We should remove them from the persister.
+        // will take care of overlapping batches
+        Queue<PersisterStateBatch> batchQueue = new LinkedList<>(


Given that this is processing the batches we have so far, I'm not convinced that it is worth merging and pruning at this point. I understand that the start offset might have changed, but surely it's already optimised by the previous pass through.

AndrewJSchofield · 2024-09-12T07:29:24Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+                    }
+                } else if (batch.firstOffset() < cur.firstOffset() && batch.lastOffset() < cur.lastOffset()) {
+                    // covers
+                    //  ______    


I like the little diagrams. I suggest adding the cur: and batch: to these on lines 630 and 640 too.

AndrewJSchofield · 2024-09-12T07:32:06Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+                // overlap with the new one.
+                // Following cases will not produce any new records so need not be handled.
+                // cur:     ____      ______     ______       _____
+                // new: ________      ______     _________  _________


Probably ought to use batch: in the diagram. For these cases, I think the idea is that cur is superseded by batch, so batchQueue.add(batch) at line 650 is sufficient to add the new batch.

AndrewJSchofield · 2024-09-12T07:34:51Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+
+
+                // covers
+                // cur:   ______   _____   _____   ______


The second case here overlaps with the second case above (matching first and last offsets). Logically that might be fine in the code, but it looks like a contradiction in the comments.

apoorvmittal10

Thanks for the PR. Probably will discuss the batching you have done or we can simplify things if we require something in order from Share Partition.

apoorvmittal10 · 2024-09-12T09:24:28Z

checkstyle/suppressions.xml

+    <suppress checks="MethodLength"
+              files="ShareCoordinatorShardTest.java"/>
+    <suppress checks="NPathComplexity"
+              files="ShareCoordinatorShard.java"/>
+    <suppress checks="CyclomaticComplexity"
+              files="ShareCoordinatorShard.java"/>


nit: While it's fine to avoid suppressions but avoid if we have better way to handle the situation in code. For MethodLength you might want to have a method to generate new TestAttributes or something similar to avoid method length issue.

The problem will still remain, since the test cases are curated to create various scenarios. We cannot generate them deterministically. If I add a new private method to return the tests, it will still have MethodLength issue.
Best I can do is to suppress warning for this specific method.

Suppressing just that method makese sense.

apoorvmittal10 · 2024-09-12T09:26:15Z

server-common/src/main/java/org/apache/kafka/server/group/share/PersisterStateBatch.java

+    public int compareTo(Object o) {
+        PersisterStateBatch that = (PersisterStateBatch) o;
+        int deltaFirst = Long.compare(this.firstOffset(), that.firstOffset());
+        if (deltaFirst == 0) {
+            int deltaLast = Long.compare(this.lastOffset(), that.lastOffset());
+            if (deltaLast == 0) {
+                return this.deliveryCount() - that.deliveryCount();
+            }
+            return deltaLast;
+        }
+        return deltaFirst;
+    }


What about deliveryState is that not important to compare for this class? If not then can we please write a comment regarding why it's skipped.

its managed in the next revision

apoorvmittal10 · 2024-09-12T09:34:45Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

-        // Any batches where the last offset is < the current start offset
-        // are now expired. We should remove them from the persister.
+        // will take care of overlapping batches
+        Queue<PersisterStateBatch> batchQueue = new LinkedList<>(


For my understanding: Why it simply cannot be a List of type LinkedList, what queue operations we need here?

apoorvmittal10 · 2024-09-12T09:36:06Z

...-coordinator/src/test/java/org/apache/kafka/coordinator/share/ShareCoordinatorShardTest.java

@@ -791,6 +793,304 @@ public void testNonSequentialBatchUpdates() {
        verify(shard.getMetricsShard(), times(3)).record(ShareCoordinatorMetrics.SHARE_COORDINATOR_WRITE_SENSOR_NAME);
    }

+    @Test
+    public void testStateBatchCombine() {


Might be better if you parametrize the test with thest input you created.

Parameterization will again result in MethodLength issue as the tests need to be placed in a generator method.

But parametrization will not fail specific scenario which is often easy to debug and fix.

Ok, will add

AndrewJSchofield · 2024-09-12T11:51:46Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+            ));
+
+        for (PersisterStateBatch batch : modifiedNewBatches) {
+            for (int i = 0; i < batchQueue.size(); i++) {


We will re-evaluate batchQueue.size() on each loop iteration, and it could in theory mutate quite significantly. I don't believe this is a safe loop condition.

AndrewJSchofield · 2024-09-12T11:55:53Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+            return batches;
+        }
+        Stack<PersisterStateBatch> stack = new Stack<>();
+        stack.add(batches.get(0));


Can't this be batches.remove(0)? Otherwise, the first element will be added to the stack and also the initial candidate in the loop.

AndrewJSchofield

I've started going through and left some comments. More to come later.

AndrewJSchofield · 2024-09-13T15:32:46Z

server-common/src/main/java/org/apache/kafka/server/group/share/PersisterStateBatch.java

@@ -25,7 +25,7 @@
 /**
 * This class contains the information for a single batch of state information for use by the {@link Persister}.
 */
-public class PersisterStateBatch {
+public class PersisterStateBatch implements Comparable {
    private final long firstOffset;
    private final long lastOffset;
    private final byte deliveryState;


In sorting terms, this is <firstOffset, lastOffset, deliveryCount, deliveryState>. I suggest putting the member variables in the same order.

AndrewJSchofield · 2024-09-13T15:36:19Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+        return new BatchOverlapState(null, null, nonOverlapping);
+    }
+
+    private static int compareBatchState(PersisterStateBatch b1, PersisterStateBatch b2) {


Could do with a comment. This is approximately following the contract for the methods like Short.compare(short x, short y). If x > y then +ve, if x < y then -ve, if x == y then 0.

AndrewJSchofield · 2024-09-13T15:40:53Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+        return finalBatches;
+    }
+
+    private static BatchOverlapState getOverlappingState(TreeSet<PersisterStateBatch> batchSet) {


This definitely needs a comment. For example, if the batch set is empty, it will throw an exception, so don't call it with an empty set.

AndrewJSchofield · 2024-09-13T15:45:49Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+
+        BatchOverlapState overlapState = getOverlappingState(sortedBatches);
+
+        while (overlapState != BatchOverlapState.SENTINEL) {


I don't believe that overlapState will ever be SENTINEL.

then we'll make it

AndrewJSchofield · 2024-09-13T15:56:38Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+                // candidate:   ______   ____       _________      ____        ____          _______
+                // max batches: 1           2       2                3          2            2
+                // min batches: 1           1       1                1          1            2
+


If I understand correctly, at line 676, both of last and candidate are members of sortedBatches.

yes, there is only one set - sortedBatches from which we find the first overlapping pair and call the members last and candidate.

AndrewJSchofield · 2024-09-13T15:57:09Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+                if (candidate.firstOffset() == last.firstOffset()) {
+                    if (candidate.lastOffset() == last.lastOffset()) {   // case 1
+                        if (compareBatchState(candidate, last) < 0) {  // candidate is lower priority
+                            sortedBatches.add(last);


candidate has not been removed, but shouldn't it be?

actually on second look, there is no state comparison needed. By nature of the compareTo method we defined, candidate can never have lower priority if last.firstOffset == candidate.firstOffset and last.lastOffset == candidate.lastOffset and since it is already present in the treeset, we don't need to do anything for case 1.

Because of above properties, the original was also valid (since treeset will not allow duplicates).

will simplify the condition

AndrewJSchofield · 2024-09-13T15:57:24Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+                        if (compareBatchState(candidate, last) < 0) {  // candidate is lower priority
+                            sortedBatches.add(last);
+                        } else {    // last is lower priority
+                            sortedBatches.add(candidate);


And doesn't this duplicate candidate?

same as above

AndrewJSchofield · 2024-09-13T15:58:40Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+        return finalBatches;
+    }
+
+    private static BatchOverlapState getOverlappingState(TreeSet<PersisterStateBatch> batchSet) {


I would tend to call this argument sortedBatches for consistency with the caller.

AndrewJSchofield · 2024-09-13T15:59:04Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

-            .build();
+                .map(ShareCoordinatorShard::toPersisterStateBatch)
+                .collect(Collectors.toList()), newStartOffset))
+            .build();   


tiny nit: trailing spaces

AndrewJSchofield

I think one final comment.

AndrewJSchofield · 2024-09-23T17:20:28Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+                sortedBatches.remove(last);  // remove older smaller interval
+                sortedBatches.remove(candidate);
+
+                last = new PersisterStateBatch(


There's no need to assign to last here. The pattern you've followed in the later cases simply adds the constructed object.

AndrewJSchofield · 2024-09-23T17:25:41Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/ShareCoordinatorShard.java

+                        // since sortedBatches order takes that into account.
+                        sortedBatches.add(candidate);
+                    } else {
+                        // case 2 is not possible with TreeSet. It is symmetric to case 3.


So I suppose that case 2 would actually be case 3 because of the sorting order of the key.

Yes that true, have kept the example for completeness sake

AndrewJSchofield

lgtm. Thanks for the diligent work on the PR comments.

mumrah

Thanks for the patch @smjn! It's exciting to see more of the batching logic landing :)

I have not yet reviewed the actual batch merging logic, but instead focused on overall code structure. I also let a suggestion on reducing boilerplate in the new test. BTW, nice usage of the stream MethodSource 😄

share-coordinator/src/test/java/org/apache/kafka/coordinator/share/StateBatchUtilTest.java

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/StateBatchUtil.java

mumrah · 2024-09-25T15:39:45Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/StateBatchUtil.java

+     * @param b2 - {@link PersisterStateBatch} to compare
+     * @return int representing comparison result.
+     */
+    private static int compareBatchState(PersisterStateBatch b1, PersisterStateBatch b2) {


This comparison is not considering the offsets. In what cases are we comparing batches without the offsets?

We use the offsets to determine the relative positioning and then use just that state to determine how to break the batches. Based on state, overlapping batches could be merged or broken down.
example:

1. ------ [1,10,0,1] ------- [5,15,0,1] => ---------- [1,15,0,1] 2. ------ [1,10,0,1] ------- [5,15,2,1] => ---- [1,4,0,1] ------- [5,15,2,1]

Ok, so we first compare on delivery count and then break ties with delivery state. Why does the delivery count have higher precedence? Intuitively, I would assume the delivery state would be higher precedence than the count.

Actually, is it even possible to merge batches with a different delivery state?

The idea behind delivery count taking precedence is because it has the connotation of epoch. Higher number => higher chance of it being fresher.
If 2 batches have same delivery count then some states which are considered terminal like ACK or ARCHIVED take precedence. Since the ordinal in the delivery state enum is numerically higher for these, they blend in well.
Betweek ACK and ARCHIVED, it is acceptable to take any (we take ARCHIVED).

HOWEVER, ARCHIVED state in not implemented in kip-932

I think ARCHIVING is not implemented in KIP_932, and that's because it's going to be in the DLQ KIP to follow.

mumrah · 2024-09-25T15:54:02Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/StateBatchUtil.java

+     * @param sortedBatches - TreeSet representing sorted set of {@link PersisterStateBatch}
+     * @return object of {@link BatchOverlapState} representing overlapping pair and non-overlapping prefix
+     */
+    private static BatchOverlapState getOverlappingState(TreeSet<PersisterStateBatch> sortedBatches) {


It seems like the fall-through case here will always copy everything from sortedBatches into the List of non-overlapping batches. Instead of this copy, could we return an Optional here where the absence of a value indicates fall-through, and the presence of a value indicates some overlapping state was found?

As per the algorithm, we need the non-overalapping prefix to be removed from the treeset so this might not work

mumrah · 2024-09-25T17:09:33Z

share-coordinator/src/main/java/org/apache/kafka/coordinator/share/StateBatchUtil.java

+        BatchOverlapState overlapState = getOverlappingState(sortedBatches);
+
+        while (overlapState != BatchOverlapState.SENTINEL) {


I think we should consider making this into a class. As written, we have a utility function that has quite a bit of state and a large while loop with many branches.

Perhaps a PersisterStateBatchCombiner class? If I understand this workflow, we are doing:

sort the batches given

merge the batches

return merged results

Steps 2 and 3 can be done in a streaming/iterable fashion which can potentially reduce our memory usage.

No the logic is not that simple. A single iteration over the sorted batches might not be enough.
The invariant is that the batches remain sorted even after manipulation

Consider:

-------- A [1,10,0,1] --- B [5,7,0,2] -------------- C [5,15,0,3]

A and B will combine to

---- [1,4,0,1] ---- [5,7,0,2] --- [8,10,0,1]

Now when combining with C, we have 2 previous batches to consider.

Secondly,

----------- A [1,10,0,1] --- B [5,7,0,2] --- C [5,7,0,3]

A and B will combine to

---- [1,4,0,1] ---- [5,7,0,2] --- [8,10,0,1] --- <- C - we broke invariant for being sorted by batches

In the current impl, these situations are implicitly handled by virtue of the treeset. Any newly generated batches are pushed back into the treeset and the getOverlappingState method finds the first overlapping pair as well as returns the non-overlapping prefix.
The non-overlapping prefix is then REMOVED from the treeset hence, once a batch is no longer overlapping, it is only looked at once guaranteeing running time efficiency.

@AndrewJSchofield
@mumrah

share-coordinator/src/test/java/org/apache/kafka/coordinator/share/StateBatchUtilTest.java

mumrah · 2024-09-26T12:18:05Z

share-coordinator/src/test/java/org/apache/kafka/coordinator/share/StateBatchUtilTest.java

+            byte deliveryState,
+            short deliveryCount


I would let these be int so we can avoid casting in the test code.

junrao

@smjn : Thanks for the PR. Made a pass of non testing files. Left a few comments.

junrao · 2024-09-27T21:20:49Z

share/src/main/java/org/apache/kafka/server/share/PersisterStateBatch.java

+        if (deltaFirst == 0) {
+            int deltaLast = Long.compare(this.lastOffset(), that.lastOffset());
+            if (deltaLast == 0) {
+                int deltaCount = this.deliveryCount() - that.deliveryCount();


Should we check the deliverStates are the same before comparing deliveryCount?

No, this was a conscious decision:
Plz check discussion #17149 (comment)

Ok. It would be useful to add a comment to explain this.

junrao · 2024-09-27T21:32:07Z

...oordinator/src/main/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombiner.java

+            return;
+        }
+
+        sortedBatches = new TreeSet<>(combinedBatchList);


To be consistent, we probably want to initialize sortedBatches in the constructor as combinedBatchList and finalBatchList.

This was done here as the treeset might not be needed at all if intervals are less than 2, hence a small optimization.

Ok.

Another question. This code is called from ShareCoordinatorShard.handleShareUpdate. Is it expensive to recreate a TreeSet for each ShareCoordinatorShard.handleShareUpdate call?

junrao · 2024-09-27T21:34:50Z

...oordinator/src/main/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombiner.java

+        while (iter.hasNext()) {
+            PersisterStateBatch candidate = iter.next();
+            if (candidate.firstOffset() <= prev.lastOffset() || // overlap
+                prev.lastOffset() + 1 == candidate.firstOffset() && compareBatchState(prev, candidate) == 0) {  // contiguous


It seems unintuitive to return continuous batches as overlap batches.

Only continuous batches with exactly same state are returned here.
We can perhaps rename the method to getMergeCandidates()?

junrao · 2024-09-27T22:07:56Z

...oordinator/src/main/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombiner.java

+                // prev:        ------   -------    -------      -------   -------   --------    -------
+                // candidate:   ------   ----       ----------     ---        ----       -------        -------
+                handleSameStateOverlap(prev, candidate);
+            } else { // diff state and non-contiguous overlap


Hmm, if we reach here, it's possible for prev and candidate to be contiguous, right?

No it's not.
getOverlappingState returns continuous pair ONLY IF the pair same delivery count and state.

The condition in the if block will capture continuous case for same state.

If it reaches else it has to be overlapping and different state. The getOverlappingState does not return continuous and different state pairs.

In fact, it updates the treeset by removing any which are found.

junrao · 2024-09-27T22:18:30Z

...oordinator/src/main/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombiner.java

+     * @param b2 - {@link PersisterStateBatch} to compare
+     * @return int representing comparison result.
+     */
+    private int compareBatchState(PersisterStateBatch b1, PersisterStateBatch b2) {


Should we just use PersisterStateBatch.compareTo ?

No we explicitly want to compare the 2 state parameters only.
We decide whether the pair can be combined based on offsets but when actually merging we can 1, 2 or 3 resultant batches. This is purely determined by the delivery count and state.

junrao

@smjn : Thanks for the updated PR. Made a pass of all files. A few more comments.

junrao · 2024-09-30T18:06:42Z

...oordinator/src/main/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombiner.java

+            return;
+        }
+
+        sortedBatches = new TreeSet<>(combinedBatchList);


Ok.

Another question. This code is called from ShareCoordinatorShard.handleShareUpdate. Is it expensive to recreate a TreeSet for each ShareCoordinatorShard.handleShareUpdate call?

junrao · 2024-09-30T18:07:48Z

share/src/main/java/org/apache/kafka/server/share/PersisterStateBatch.java

+        if (deltaFirst == 0) {
+            int deltaLast = Long.compare(this.lastOffset(), that.lastOffset());
+            if (deltaLast == 0) {
+                int deltaCount = this.deliveryCount() - that.deliveryCount();


Ok. It would be useful to add a comment to explain this.

junrao · 2024-09-30T18:42:40Z

...inator/src/test/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombinerTest.java

+
+        static List<PersisterStateBatch> singleBatch(
+            long firstOffset,
+            long prevOffset,


Why is this called prevOffset instead of lastOffset ?

remnant of a previous find a and replace.

junrao · 2024-09-30T18:45:40Z

...inator/src/test/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombinerTest.java

+            ),
+
+            new BatchTestHolder(
+                "Candidate lower state. Candidate first and prev offsets strictly larger than prev.",


BatchTestHolder uses curList and newList while the text uses candidate and prev. It would be useful to make them consistent.

junrao · 2024-09-30T18:54:36Z

...inator/src/test/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombinerTest.java

+                BatchTestHolder.multiBatch()
+                    .addBatch(100, 110, 0, 1)
+                    .addBatch(121, 130, 0, 1)
+                    .addBatch(105, 115, 0, 1) // overlap with 1st batch


Hmm, could that happen? We call combineStateBatches() on every ShareCoordinatorShard.handleShareUpdate. So the current state should not contain overlapping ranges?

This is defensive - at some point we will develop tooling which will allow adding/removing records to the __share_group_state topic to repair bad state.
In that case, if there is a repetition due to human input, the algorithm is capable of handling it.

But every ShareUpdateRecord is replayed through handleShareUpdate, which calls combineStateBatches().

In that case, do we still need to include overlapping offset ranges in batchesSoFar?

junrao · 2024-09-30T18:56:49Z

...inator/src/test/java/org/apache/kafka/coordinator/share/PersisterStateBatchCombinerTest.java

+
+            new BatchTestHolder(
+                "Handle overlapping batches with different priority.",
+                BatchTestHolder.singleBatch(100, 110, 0, 1),  //[(100-115, 0, 1), (121-130, 0, 1)]


The comment doesn't seem to match the code?

smjn · 2024-09-30T20:26:26Z

Another question. This code is called from ShareCoordinatorShard.handleShareUpdate. Is it expensive to recreate a TreeSet for each ShareCoordinatorShard.handleShareUpdate call?

We can revisit this if it turns out ot be a bottleneck.

smjn added 3 commits September 10, 2024 12:13

KAFKA-17367: Share coordinator impl. Added additional tests. [3/N]

2128532

Improved test names.

5f377fb

removed extraneous comment.

8fecb1d

AndrewJSchofield suggested changes Sep 10, 2024

View reviewed changes

removed incorrect comment, added another overlap test.

7042155

smjn added 7 commits September 11, 2024 14:30

Modified combine batches logic.

a299752

Added start offset based pruning.

2e1d9a5

fixed comment.

080d422

Handle overlapping input.

bce0cb3

renamed method arg.

f112e0c

fixed bug in merge logic.

fca3dcb

add delivery count as a sort dimension.

1e88e7d

smjn requested a review from AndrewJSchofield September 11, 2024 15:22

minor refactoring.

0c7aadf

AndrewJSchofield suggested changes Sep 12, 2024

View reviewed changes

apoorvmittal10 reviewed Sep 12, 2024

View reviewed changes

apoorvmittal10 added the KIP-932 Queues for Kafka label Sep 12, 2024

AndrewJSchofield suggested changes Sep 12, 2024

View reviewed changes

smjn added 2 commits September 12, 2024 17:51

merge logic overhaul.

e483bd2

minor optimization, fixed comments.

65f3b7b

smjn requested review from AndrewJSchofield and apoorvmittal10 September 12, 2024 15:03

smjn added 5 commits September 12, 2024 21:05

added generator for tests.

fc548d2

further optimized merge.

9049ab6

minor perf tweaks.

1d1eb19

removed extraneous prune.

c892bb0

fixed comment.

8fb168e

AndrewJSchofield suggested changes Sep 13, 2024

View reviewed changes

incorporated further comments.

d20624b

smjn requested a review from AndrewJSchofield September 23, 2024 07:26

AndrewJSchofield suggested changes Sep 23, 2024

View reviewed changes

incorporated comments.

6561d87

smjn changed the title ~~KAFKA-17367: Share coordinator impl. Added additional tests. [3/N]~~ KAFKA-17367: Share coordinator impl. New merge batches algorithm. [3/N] Sep 23, 2024

github-actions bot added the core Kafka Broker label Sep 23, 2024

smjn requested a review from AndrewJSchofield September 23, 2024 17:40

AndrewJSchofield approved these changes Sep 23, 2024

View reviewed changes

smjn added 4 commits September 25, 2024 12:29

Merge remote-tracking branch 'ak/trunk' into KAFKA-17367-3n

d747dd3

Merge remote-tracking branch 'ak/trunk' into KAFKA-17367-3n

60bda5d

Moved state batch merge code to util class.

2e5003c

fixed documentation.

088bbf1

mumrah reviewed Sep 25, 2024

View reviewed changes

incorporated review comments.

32db0a6

mumrah reviewed Sep 26, 2024

View reviewed changes

changed byte, short to int in tests.

4d2eaac

smjn requested a review from mumrah September 26, 2024 14:18

mumrah added the ci-approved label Sep 27, 2024

smjn added 4 commits September 27, 2024 13:44

converted batch util to class.

9928084

renamed a few private methods.

86d1237

added comprehensive javadoc.

5e98616

minor bug fix.

7a63481

junrao reviewed Sep 27, 2024

View reviewed changes

smjn requested a review from junrao September 28, 2024 02:53

smjn added 2 commits September 28, 2024 09:48

minor refactoring.

9d2e834

create new arraylist from arguments.

0c7436d

junrao reviewed Sep 30, 2024

View reviewed changes

incorporated comments.

6f1725f

smjn requested a review from junrao September 30, 2024 20:34


		BatchOverlapState overlapState = getOverlappingState(sortedBatches);

		while (overlapState != BatchOverlapState.SENTINEL) {

KAFKA-17367: Share coordinator impl. New merge batches algorithm. [3/N] #17149

Are you sure you want to change the base?

KAFKA-17367: Share coordinator impl. New merge batches algorithm. [3/N] #17149

Conversation

smjn commented Sep 10, 2024 • edited Loading

AndrewJSchofield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smjn commented Sep 10, 2024

AndrewJSchofield commented Sep 10, 2024

AndrewJSchofield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apoorvmittal10 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smjn Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewJSchofield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smjn Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewJSchofield left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewJSchofield left a comment

Choose a reason for hiding this comment

mumrah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smjn Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smjn Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smjn Sep 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smjn commented Sep 10, 2024 •

edited

Loading

smjn Sep 12, 2024 •

edited

Loading

smjn Sep 13, 2024 •

edited

Loading

smjn Sep 26, 2024 •

edited

Loading

smjn Sep 26, 2024 •

edited

Loading

smjn Sep 28, 2024 •

edited

Loading