-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Kafka consumer #151
Batch Kafka consumer #151
Conversation
@@ -164,31 +164,39 @@ runConsumer | |||
, HasCallStack | |||
) | |||
=> Timeout | |||
-> Int | |||
-> (a -> m ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right shape for the handler? Should it be [a] -> m ()
instead?
Seems like it'd more flexible for the client to do "smart" things if we hand them the entire batch. I'm thinking things like doing concurrent actions or grouping stuff together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I defer to you.
Practically speaking, I'd put my money on consumers not doing smart things, but it's also trivial for them to do their own traverse_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think [a] -> m ()
also implies that we can't store or commit offsets on each event, so I guess for the first attempt at this I can go with the a -> m ()
shape and see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this re-introduces some duplicate processing. If we're deploying, a second consumer can come up and would reprocess any messages the existing consumers have stored but not yet committed, which is up to batchSize
messages.
Seems like an OK middle-ground with how it was before. However, I still feel like we could've arrived at the same rough behavior in this regard if we had simply reduced the auto-commit setting to do so more frequently, no?
If we wanted to keep the reduction in duplicate processing, I think still doing a store+commit on every message, but now with batching, could also help things... since you're polling less frequently it would take pressure off the broker even though you're committing more frequently. WDYT? |
Probably worth a try to see what happens; edit: Oh I guess we'd not get the poll reduction then |
Closing this as the performance is back to normal with doing auto-commits for the answers consumer. It seems like there is just a trade off on size of the batch of offset commits vs. reprocessing events. For our current consumer re-processing is fine due to the idempotency of the consumer. It also seems like there is just going to be a performance cost to pay when deploying new consumers that is un-related to reprocessing events. The performance hit seems likely due to the kafka rebalance when adding/removing consumers to a consumer group and not due to reprocessing a handful (100s of events). |
Did slowing the rotation from double-then-halve to plus-one-minus-one solve it? |
Add ability to handle batches for our kafka consumers