-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling for empty record batches #1107
Fix handling for empty record batches #1107
Conversation
957669f
to
e6ac6fa
Compare
Any chance of getting a review? This addresses a serious bug. |
Hi @SoMuchForSubtlety, Thanks for the contribution! It would be great if we could introduce a test case. Potentially via a packet capture and the test fixtures? https://github.com/segmentio/kafka-go/blob/main/message_test.go#L568 |
I tried to get Kafka to send my empty batches in a test by sending lots of duplicate keys and waiting for them to be consolidated, but I did not manage to get it working. I will try again when I have some time. |
With magic v2 and above, it's possible to have empty record batches. This was previously not considered and could lead to panics because the code relied on the record count being non-zero in multiple places. Fixes segmentio#874
e6ac6fa
to
b130648
Compare
Turns out that empty batches are only possible if they were sent by an idempotent producer (which kafka-go currently doesn't support).
Here is the content of a test log created with kafka-go. The batch sequence numbers are not set ( kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files /kafka/kafka-logs-e1e7d1bd93aa/kafka-go-69b4a7695fa8c9c1-0/00000000000000000000.log
Dumping /kafka/kafka-logs-e1e7d1bd93aa/kafka-go-69b4a7695fa8c9c1-0/00000000000000000000.log
Starting offset: 0
baseOffset: 0 lastOffset: 2 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 0 CreateTime: 1683670701120 size: 97 magic: 2 compresscodec: NONE crc: 1985786587 isvalid: true
| offset: 0 CreateTime: 1683670701120 keysize: 1 valuesize: 4 sequence: -1 headerKeys: [] key: 1 payload: data
| offset: 1 CreateTime: 1683670701120 keysize: 1 valuesize: 4 sequence: -1 headerKeys: [] key: 2 payload: data
| offset: 2 CreateTime: 1683670701120 keysize: 1 valuesize: 4 sequence: -1 headerKeys: [] key: 3 payload: data
baseOffset: 6 lastOffset: 8 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 97 CreateTime: 1683670702141 size: 73 magic: 2 compresscodec: NONE crc: 2738021477 isvalid: true
| offset: 8 CreateTime: 1683670702141 keysize: 1 valuesize: 4 sequence: -1 headerKeys: [] key: 9 payload: data
baseOffset: 54 lastOffset: 56 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 170 CreateTime: 1683670711244 size: 100 magic: 2 compresscodec: NONE crc: 3357349968 isvalid: true
| offset: 54 CreateTime: 1683670711244 keysize: 1 valuesize: 5 sequence: -1 headerKeys: [] key: 4 payload: data2
| offset: 55 CreateTime: 1683670711244 keysize: 1 valuesize: 5 sequence: -1 headerKeys: [] key: 5 payload: data2
| offset: 56 CreateTime: 1683670711244 keysize: 1 valuesize: 5 sequence: -1 headerKeys: [] key: 6 payload: data2 I will create some test data with an idempotent producer and write a test case. Footnotes |
I'm not sure if this is related to empty batches or not, but I'm seeing data corruption with this patch applied.
The message headers also seem to be affected, because the committed offsets reset for one of my partitions get reset
|
Are contributions to this one allowed? Was wondering what's the progress on this issue, we keep getting these panics frequently. Let me know if help is needed to speed this up ✌️ |
Same question with @steevehook above, I wonder if there's a checklist we can fill to faster the process of fixing this. We're maintaining a private forked version of the library just to fix this issue and we really hope to get synced back with the public releases soon. |
I ended up switching to a different library since I still experienced strange bugs even with my patch applied. Feel free to pick up my changes and open a new PR. |
Thanks for following up! Any contributions related to this issue are welcome! Thanks! |
With magic v2 and above, it's possible to have empty record batches. This was previously not considered and could lead to panics because the code relied on the record count being non-zero in multiple places.
This PR contains lots of refactoring to make the code easier to follow and avoid unneccesary
readHeader
calls. Please let me know if you are unhappy with any of the changes I made.I didn't manage to create a reliable test case, but I confirmed that the fix manually. I'm open to suggestions for how to test this.
Fixes #874