Add more robust commit strategy to Kafka.Consumer #149

pbrisbin · 2024-02-29T20:16:31Z

Autocommit is nice and simple, but it can cause both dropped messages
and duplicate processing.

Concretely, we found that deployment of a new consumer will start and,
if the existing consumer has not (auto)committed in a bit, reprocess a
backlog of messages that it doesn't need to. This results in an increase
in lag and alerts on every deployment.

Of the various strategies available, we're implementing this one¹:

When a message is processed, commit its offset asynchronously

This is fast enough to do on every loop, but of course trades on
reliability and may not succeed every time -- particularly if we're
crashing or shutting down. So...
On shutdown, commit the offset of the last-processed message (for
every partition) synchronously

This will be robust and cover for any failures we had from the
asynchronous commits.

This required we track the last message we processed on each commit as
we go. We could've used the "commit for all messages of the last poll"
function instead, but this would risk dropping messages if we've not
yet processed them. We seem to only poll for 1 message at a time, so
this risk is low, but I suspect we'll want to increase that and gain
free performance, so it's good to be ready.

I attempted to do the tracking with StateT, but it lacks instances
like MonadUnliftIO, so couldn't. Instead, I went with an IORef. It
is encapsulated, so we could move to something else (e.g. TVar or
Chan) without disrupting the calling code at all, provided it only
requires MonadIO.

https://medium.com/@rramiz.rraza/kafka-programming-different-ways-to-commit-offsets-7bcd179b225a#ac4e ↩

Autocommit is nice and simple, but it can cause both dropped messages and duplicate processing. Concretely, we found that deployment of a new consumer will start and, if the existing consumer has not (auto)committed in a bit, reprocess a backlog of messages that it doesn't need to. This results in an increase in lag and alerts on every deployment. Of the various strategies available, we're implementing this one[^1]: - When a message is processed, commit its offset asynchronously This is fast enough to do on every loop, but of course trades on reliability and may not succeed every time -- particularly if we're crashing or shutting down. So... - On shutdown, commit the offset of the last-processed message (for every partition) synchronously This will be robust and cover for any failures we had from the asynchronous commits. This required we track the last message we processed on each commit as we go. We could've used the "commit for all messages of the last poll" function instead, but this would risk dropping messages if we've not yet processed them. We seem to only poll for 1 message at a time, so this risk is low, but I suspect we'll want to increase that and gain free performance, so it's good to be ready. I attempted to do the tracking with `StateT`, but it lacks instances like `MonadUnliftIO`, so couldn't. Instead, I went with an `IORef`. It is encapsulated, so we could move to something else (e.g. `TVar` or `Chan`) without disrupting the calling code at all, provided it only requires `MonadIO`. [^1]: https://medium.com/@rramiz.rraza/kafka-programming-different-ways-to-commit-offsets-7bcd179b225a#ac4e

We handle commits intelligently now and can be opinionated about how it works.

It's all we need, and converting it on insert instead of read means we don't have to build the `ConsumerRecord () ()` in the meantime.

library/Freckle/App/Kafka/Consumer/Offsets.hs

By disabling `auto.offset.store` the `commitAllOffsets` function will only commit the offsets we explictly store (vs the max of the last poll). This means we can then call `storeOffsetMessage` after processing and be sure we don't commit things we haven't processed. Basically, it's the same (desired) semantics, without the manual tracking. Win.

z0isch

❤️

pbrisbin added 3 commits February 29, 2024 15:14

Remove the autocommit settings for Kafka.Consumer

7062343

We handle commits intelligently now and can be opinionated about how it works.

Note Kafka change in CHANGELOG

eb3fae4

pbrisbin changed the title ~~pb/offsets~~ Add more robust commit strategy to Kafka.Consumer Feb 29, 2024

pbrisbin requested a review from z0isch February 29, 2024 20:16

pbrisbin marked this pull request as ready for review February 29, 2024 20:17

Just hold the TopicPartition in the LastMessage map

238cb63

It's all we need, and converting it on insert instead of read means we don't have to build the `ConsumerRecord () ()` in the meantime.

z0isch reviewed Feb 29, 2024

View reviewed changes

library/Freckle/App/Kafka/Consumer/Offsets.hs Outdated Show resolved Hide resolved

pbrisbin added 4 commits February 29, 2024 16:36

Refactor let to where

0e12661

Use stored offsets for both kinds of commit

5e8c83c

Put commit handling in a span

a9d23c9

z0isch approved these changes Feb 29, 2024

View reviewed changes

pbrisbin merged commit ccee6ea into main Feb 29, 2024
6 of 7 checks passed

pbrisbin deleted the pb/offsets branch February 29, 2024 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more robust commit strategy to Kafka.Consumer #149

Add more robust commit strategy to Kafka.Consumer #149

pbrisbin commented Feb 29, 2024 •

edited

Loading

z0isch left a comment

Add more robust commit strategy to Kafka.Consumer #149

Add more robust commit strategy to Kafka.Consumer #149

Conversation

pbrisbin commented Feb 29, 2024 • edited Loading

Footnotes

z0isch left a comment

Choose a reason for hiding this comment

pbrisbin commented Feb 29, 2024 •

edited

Loading