Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Kafka consumer #151

Closed
wants to merge 2 commits into from
Closed

Batch Kafka consumer #151

wants to merge 2 commits into from

Conversation

z0isch
Copy link
Contributor

@z0isch z0isch commented Mar 6, 2024

Add ability to handle batches for our kafka consumers

@z0isch z0isch requested a review from pbrisbin March 6, 2024 15:41
@@ -164,31 +164,39 @@ runConsumer
, HasCallStack
)
=> Timeout
-> Int
-> (a -> m ())
Copy link
Contributor Author

@z0isch z0isch Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right shape for the handler? Should it be [a] -> m () instead?

Seems like it'd more flexible for the client to do "smart" things if we hand them the entire batch. I'm thinking things like doing concurrent actions or grouping stuff together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I defer to you.

Practically speaking, I'd put my money on consumers not doing smart things, but it's also trivial for them to do their own traverse_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think [a] -> m () also implies that we can't store or commit offsets on each event, so I guess for the first attempt at this I can go with the a -> m () shape and see what happens.

Copy link
Member

@pbrisbin pbrisbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this re-introduces some duplicate processing. If we're deploying, a second consumer can come up and would reprocess any messages the existing consumers have stored but not yet committed, which is up to batchSize messages.

Seems like an OK middle-ground with how it was before. However, I still feel like we could've arrived at the same rough behavior in this regard if we had simply reduced the auto-commit setting to do so more frequently, no?

@pbrisbin
Copy link
Member

pbrisbin commented Mar 6, 2024

If we wanted to keep the reduction in duplicate processing, I think still doing a store+commit on every message, but now with batching, could also help things... since you're polling less frequently it would take pressure off the broker even though you're committing more frequently. WDYT?

@z0isch
Copy link
Contributor Author

z0isch commented Mar 6, 2024

If we wanted to keep the reduction in duplicate processing, I think still doing a store+commit on every message, but now with batching, could also help things... since you're polling less frequently it would take pressure off the broker even though you're committing more frequently. WDYT?

Probably worth a try to see what happens; however a client could still get this behavior by having batchSize=1 right?

edit: Oh I guess we'd not get the poll reduction then

@z0isch
Copy link
Contributor Author

z0isch commented Mar 8, 2024

Closing this as the performance is back to normal with doing auto-commits for the answers consumer.

It seems like there is just a trade off on size of the batch of offset commits vs. reprocessing events. For our current consumer re-processing is fine due to the idempotency of the consumer.

It also seems like there is just going to be a performance cost to pay when deploying new consumers that is un-related to reprocessing events. The performance hit seems likely due to the kafka rebalance when adding/removing consumers to a consumer group and not due to reprocessing a handful (100s of events).

@z0isch z0isch closed this Mar 8, 2024
@pbrisbin
Copy link
Member

pbrisbin commented Mar 8, 2024

It also seems like there is just going to be a performance cost to pay when deploying new consumers that is un-related to reprocessing events

Did slowing the rotation from double-then-halve to plus-one-minus-one solve it?

@z0isch
Copy link
Contributor Author

z0isch commented Mar 8, 2024

It also seems like there is just going to be a performance cost to pay when deploying new consumers that is un-related to reprocessing events

Did slowing the rotation from double-then-halve to plus-one-minus-one solve it?

Nope, it just made it have 3 blips instead of one. I think this makes sense because there was probably 3 re-balances
image

AFAICT it seems like the only way to avoid these performance issues is to not rebalance which I think implies doing some sort of static partition assignment for the consumers. This would require a lot of complexity in coordination between the consumers though, so I think we'd need to weigh that against how important it is to not have these spikes.

@z0isch z0isch deleted the aj/batch-kafka-consumer branch March 15, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants