kip 848 updates + faq cleanup (#182)

mensfeld · web-flow · commit 89899d78a7fe · 2025-11-01T11:00:15.000+01:00
diff --git a/FAQ.md b/FAQ.md
@@ -1,5 +1,4 @@
 1. [Does Karafka require Ruby on Rails?](#does-karafka-require-ruby-on-rails)
-1. [Why there used to be an ApplicationController mentioned in the Wiki and some articles?](#why-there-used-to-be-an-applicationcontroller-mentioned-in-the-wiki-and-some-articles)
 1. [Does Karafka require Redis and/or Sidekiq to work?](#does-karafka-require-redis-andor-sidekiq-to-work)
 1. [Could an HTTP controller also consume a fetched message through the Karafka router?](#could-an-http-controller-also-consume-a-fetched-message-through-the-karafka-router)
 1. [Does Karafka require a separate process running?](#does-karafka-require-a-separate-process-running)
@@ -8,7 +7,6 @@
 1. [Why Karafka does not pre-initializes consumers prior to first message from a given topic being received?](#why-karafka-does-not-pre-initializes-consumers-prior-to-first-message-from-a-given-topic-being-received)
 1. [Does Karafka restart dead PG connections?](#does-karafka-restart-dead-pg-connections)
 1. [Does Karafka require gems to be thread-safe?](#does-karafka-require-gems-to-be-thread-safe)
-1. [When Karafka is loaded via railtie in test env, SimpleCov does not track code changes](#when-karafka-is-loaded-via-a-railtie-in-test-env-simplecov-does-not-track-code-changes)
 1. [Can I use Thread.current to store data in between batches?](#can-i-use-threadcurrent-to-store-data-between-batches)
 1. [Why Karafka process does not pick up newly created topics until restarted?](#why-karafka-process-does-not-pick-up-newly-created-topics-until-restarted)
 1. [Why is Karafka not doing work in parallel when I started two processes?](#why-is-karafka-not-doing-work-in-parallel-when-i-started-two-processes)
@@ -40,7 +38,6 @@
 1. [Why, despite setting `initial_offset` to `earliest`, Karafka is not picking up messages from the beginning?](#why-despite-setting-initial_offset-to-earliest-karafka-is-not-picking-up-messages-from-the-beginning)
 1. [Should I TSTP, wait a while, then send TERM or set a longer `shutdown_timeout` and only send a TERM signal?](#should-i-tstp-wait-a-while-then-send-term-or-set-a-longer-shutdown_timeout-and-only-send-a-term-signal)
 1. [Why am I getting `error:0A000086:SSL routines::certificate verify failed` after upgrading Karafka?](#why-am-i-getting-error0a000086ssl-routinescertificate-verify-failed-after-upgrading-karafka)
-1. [Why am I seeing a `karafka_admin` consumer group with a constant lag present?](#why-am-i-seeing-a-karafka_admin-consumer-group-with-a-constant-lag-present)
 1. [Can I consume the same topic independently using two consumers within the same application?](#can-i-consume-the-same-topic-independently-using-two-consumers-within-the-same-application)
 1. [Why am I seeing Broker failed to validate record (invalid_record) error?](#why-am-i-seeing-broker-failed-to-validate-record-invalid_record-error)
 1. [How can I make polling faster?](#how-can-i-make-polling-faster)
@@ -153,7 +150,6 @@
 1. [Is it possible to exclude `karafka-web` related reporting counts from the web UI dashboard?](#is-it-possible-to-exclude-karafka-web-related-reporting-counts-from-the-web-ui-dashboard)
 1. [Can I log errors in Karafka with topic, partition, and other consumer details?](#can-i-log-errors-in-karafka-with-topic-partition-and-other-consumer-details)
 1. [Why did our Kafka consumer start from the beginning after a 2-week downtime, but resumed correctly after a brief stop and restart?](#why-did-our-kafka-consumer-start-from-the-beginning-after-a-2-week-downtime-but-resumed-correctly-after-a-brief-stop-and-restart)
-1. [Why am I experiencing a load error when using Karafka with Ruby 2.7, and how can I fix it?](#why-am-i-experiencing-a-load-error-when-using-karafka-with-ruby-27-and-how-can-i-fix-it)
 1. [Why am I getting `+[NSCharacterSet initialize] may have been in progress in another thread when fork()` error when forking on macOS?](#why-am-i-getting-nscharacterset-initialize-may-have-been-in-progress-in-another-thread-when-fork-error-when-forking-on-macos)
 1. [How does Karafka handle messages with undefined topics, and can they be routed to a default consumer?](#how-does-karafka-handle-messages-with-undefined-topics-and-can-they-be-routed-to-a-default-consumer)
 1. [What happens if an error occurs while consuming a message in Karafka? Will the message be marked as not consumed and automatically retried?](#what-happens-if-an-error-occurs-while-consuming-a-message-in-karafka-will-the-message-be-marked-as-not-consumed-and-automatically-retried)
@@ -225,11 +221,7 @@
 
 ## Does Karafka require Ruby on Rails?
 
-**No**. Karafka is a fully independent framework that can operate in a standalone mode. It can be easily integrated with any Ruby-based application, including those written with Ruby on Rails. Please follow the [Integrating with Ruby on Rails and other frameworks](https://github.com/karafka/karafka/wiki/Integrating-with-Ruby-on-Rails-and-other-frameworks) Wiki section.
-
-## Why there used to be an ApplicationController mentioned in the Wiki and some articles?
-
-You can name the main application consumer with any name. You can even call it ```ApplicationController``` or anything else you want. Karafka will sort that out, as long as your root application consumer inherits from the ```Karafka::BaseConsumer```. It's not related to Ruby on Rails controllers. Karafka framework used to use the ```*Controller``` naming convention up until Karafka 1.2 where it was changed because many people had problems with name collisions.
+**No**. Karafka is a fully independent framework that can operate in a standalone mode. It can be easily integrated with any Ruby-based application, including those written with Ruby on Rails. Please follow the [Integrating with Ruby on Rails and other frameworks](Integrating-with-Ruby-on-Rails-and-other-frameworks) documentation.
 
 ## Does Karafka require Redis and/or Sidekiq to work?
 
@@ -241,7 +233,7 @@ You can name the main application consumer with any name. You can even call it `
 
 ## Does Karafka require a separate process running?
 
-No, however, it is **recommended**. By default, Karafka requires a separate process (Karafka server) to consume and process messages. You can read about it in the [Consuming messages](https://github.com/karafka/karafka/wiki/Consuming-Messages) section of the Wiki.
+No, however, it is **recommended**. By default, Karafka requires a separate process (Karafka server) to consume and process messages. You can read about it in the [Consuming messages](Consuming-Messages) section of the documentation.
 
 Karafka can also be embedded within another process so you do not need to run a separate process. You can read about it [here](Embedding).
 
@@ -271,16 +263,12 @@ Because Karafka does not have knowledge about the whole topology of a given Kafk
 
 ## Does Karafka restart dead PG connections?
 
-Karafka, starting from `2.0.16` will automatically release no longer used ActiveRecord connections. They should be handled and reconnected by the Rails connection reaper. You can implement custom logic to reconnect them yourself if needed beyond the reaping frequency. More details on that can be found [here](Active-Record-Connections-Management#dealing-with-dead-database-connections).
+Karafka will automatically release no longer used ActiveRecord connections. They should be handled and reconnected by the Rails connection reaper. You can implement custom logic to reconnect them yourself if needed beyond the reaping frequency. More details on that can be found [here](Active-Record-Connections-Management#dealing-with-dead-database-connections).
 
 ## Does Karafka require gems to be thread-safe?
 
 Yes. Karafka uses multiple threads to process data, similar to how Puma or Sidekiq does it. The same rules apply.
 
-## When Karafka is loaded via a railtie in test env, SimpleCov does not track code changes
-
-Karafka hooks with railtie to load `karafka.rb`. Simplecov **needs** to be required [before](https://github.com/simplecov-ruby/simplecov#getting-started=) any code is loaded.
-
 ## Can I use Thread.current to store data between batches?
 
 **No**. The first available thread will pick up work from the queue to better distribute work. This means that you should **not** use `Thread.current` for any type of data storage.
@@ -372,7 +360,7 @@ Upon rebalance, all uncommitted offsets will be committed before a given partiti
 
 ## Can I use Karafka with Ruby on Rails as a part of an internal gem?
 
-Karafka 2.x has [Rails auto-detection](https://github.com/karafka/karafka/blob/78ea23f7044b81b7e0c74bb02ad3d2e5a5fa1b7c/lib/karafka/railtie.rb#L19), and it is loaded early, so some components may be available later, e.g., when ApplicationConsumer inherits from BaseConsumer that is provided by the separate gem that needs an initializer.
+Karafka has Rails auto-detection and loads early, so some components may be available later, e.g., when ApplicationConsumer inherits from BaseConsumer that is provided by the separate gem that needs an initializer.
 
 Moreover, despite the same code base, some processes (`rails s`, `rails db:migrate`, `sidekiq s`) may not need to know about karafka, and there is no need to load it.
 
@@ -576,7 +564,7 @@ To make Kafka accept messages bigger than 1MB, you must change both Kafka and Ka
 
 To increase the maximum accepted payload size in Kafka, you can adjust the `message.max.bytes` and `replica.fetch.max.bytes` configuration parameters in the server.properties file. These parameters controls the maximum size of a message the Kafka broker will accept.
 
-To allow [WaterDrop](https://github.com/karafka/waterdrop) (Karafka producer) to send bigger messages, you need to:
+To allow WaterDrop (Karafka producer) to send bigger messages, you need to:
 
 - set the `max_payload_size` config option to value in bytes matching your maximum expected payload.
 - set `kafka` scoped `message.max.bytes` to the same value.
@@ -792,10 +780,6 @@ class KarafkaApp < Karafka::App
 end
 ```
 
-## Why am I seeing a `karafka_admin` consumer group with a constant lag present?
-
-The `karafka_admin` consumer group was created when using certain admin API operations. After upgrading to karafka `2.0.37` or higher, this consumer group is no longer needed and can be safely removed.
-
 ## Can I consume the same topic independently using two consumers within the same application?
 
 Yes. You can define independent consumer groups operating within the same application. Let's say you want to consume messages from a topic called `event` using two consumers. You can do this as follows:
@@ -1400,7 +1384,23 @@ The `range` strategy has some advantages over the `round-robin` strategy, where
 
 Since data is often related within the same partition, `range` can keep related data processing within the same consumer, which could lead to benefits like better caching or business logic efficiencies. This can be useful, for example, to join records from two topics with the same number of partitions and the same key-partitioning logic.
 
-The assignment strategy is not a one-size-fits-all solution and can be changed based on the specific use case. If you want to change the assignment strategy in Karafka, you can set the `partition.assignment.strategy` configuration value to either `range`, `roundrobin` or `cooperative-sticky`. It's important to consider your particular use case, the number of consumers, and the nature of your data when choosing your assignment strategy.
+The assignment strategy is not a one-size-fits-all solution and can be changed based on the specific use case.
+
+**Recommended approaches:**
+
+1. **KIP-848 Consumer Protocol (Kafka 4.0+)** - This is the recommended approach for new deployments:
+   - Set `group.protocol` to `consumer` to use the new protocol
+   - Configure `group.remote.assignor` (e.g., `uniform` or `range`)
+   - Benefits: Faster rebalancing, less disruption, simpler operation, better static membership handling
+
+2. **Cooperative-Sticky (for older Kafka versions)** - Use when KIP-848 is not available:
+   - Set `partition.assignment.strategy` to `cooperative-sticky`
+   - Provides incremental rebalancing benefits over eager protocols
+   - Good fallback option for teams on older infrastructure
+
+3. **Legacy strategies** - `range` or `roundrobin` for specific use cases or compatibility requirements
+
+It's important to consider your Kafka broker version, particular use case, the number of consumers, and the nature of your data when choosing your assignment strategy.
 
 For Kafka 4.0+ with KRaft mode, you can also use the [next-generation consumer group protocol (KIP-848)](Kafka-New-Rebalance-Protocol) with `group.protocol: 'consumer'`, which offers significantly improved rebalance performance.
 
@@ -1457,7 +1457,7 @@ Karafka provides ways to implement password protection, and you can find detaile
 
 Yes, it's possible to use a Karafka producer without a consumer in two ways:
 
-1. You can use [WaterDrop](https://github.com/karafka/waterdrop), a standalone Karafka component for producing Kafka messages. WaterDrop was explicitly designed for use cases where only message production is required, with no need for consumption.
+1. You can use WaterDrop, a standalone Karafka component for producing Kafka messages. WaterDrop was explicitly designed for use cases where only message production is required, with no need for consumption.
 
 1. Alternatively, if you have Karafka already in your application, avoid running the `karafka server` command, as it won't make sense without any topics to consume. You can run other processes and produce messages from them. In scenarios like that, there is no need to define any routes. `Karafka#producer` should operate without any problems.
 
@@ -1888,13 +1888,13 @@ It is indicative of a connectivity issue. Let's break down the meaning and impli
 
 1. **Implications for Karafka Web UI**:
 
-    - If you're experiencing this issue with topics related to Karafka Web UI, it's essential to note that Karafka improved its error handling in version 2.2.2. If you're using an older version, upgrading to the latest Karafka and Karafka Web UI versions might alleviate the issue.
+    - If you're experiencing this issue with topics related to Karafka Web UI, upgrading to the latest Karafka and Karafka Web UI versions is recommended as error handling has been continuously improved.
 
     - Another scenario where this error might pop up is during rolling upgrades of the Kafka cluster. If the Karafka Web UI topics have a replication factor 1, there's no redundancy for the partition data. During a rolling upgrade, as brokers are taken down sequentially for upgrades, there might be brief windows where the partition's data isn't available due to its residing broker being offline.
 
 Below, you can find a few recommendations in case you encounter this error:
 
-1. **Upgrade Karafka**: If you're running a version older than `2.2.2`, consider upgrading both Karafka and Karafka Web UI. This might resolve the issue if it's related to previous error-handling mechanisms.
+1. **Upgrade Karafka**: Always use the latest stable versions of Karafka and Karafka Web UI to benefit from improved error handling and bug fixes.
 
 1. **Review Configurations**: Examine your Karafka client configurations, especially timeouts and broker addresses, to ensure they're set appropriately.
 
@@ -2130,15 +2130,6 @@ This issue is likely due to the `offsets.retention.minutes` setting in Kafka. Ka
 
 You can read more about this behavior [here](Operations-Development-vs-Production#configure-your-brokers-offsetsretentionminutes-policy).
 
-## Why am I experiencing a load error when using Karafka with Ruby 2.7, and how can I fix it?
-
-If you're experiencing a load error with Karafka on Ruby 2.7, it's due to a bug in Bundler. To fix this:
-
-- **Install Bundler v2.4.22**: Run `gem install bundler -v 2.4.22 --no-document`.
-- **Update RubyGems to v3.4.22**: Run `gem update --system 3.4.22 --no-document`.
-
-Note: Ruby 2.7 is EOL and no longer supported. For better security and functionality, upgrading to Ruby 3.0 or higher is highly recommended.
-
 ## Why am I getting `+[NSCharacterSet initialize] may have been in progress in another thread when fork()` error when forking on macOS?
 
 When running a Rails application with Karafka and Puma on macOS, hitting the Karafka dashboard or endpoints can cause crashes with an error related to fork() and Objective-C initialization. This is especially prevalent in Puma's clustered mode.
diff --git a/Operations/Deployment.md b/Operations/Deployment.md
@@ -553,7 +553,48 @@ When deploying Karafka consumers using Kubernetes, it's generally not recommende
 
 For larger deployments with many consumer processes, it's especially important to be mindful of the rebalancing issue.
 
-Overall, when deploying Karafka consumers using Kubernetes, it's important to consider the deployment strategy carefully and to choose a strategy that will minimize the risk of rebalancing issues. By using the `Recreate` strategy and configuring Karafka static group memberships and `cooperative.sticky` rebalance strategy settings, you can ensure that your Karafka application stays reliable and performant, even during large-scale deployments.
+Overall, when deploying Karafka consumers using Kubernetes, it's important to consider the deployment strategy carefully and to choose a strategy that will minimize the risk of rebalancing issues. By using the `Recreate` strategy and configuring Karafka with appropriate rebalancing strategies, you can ensure that your Karafka application stays reliable and performant.
+
+### Choosing the Right Rebalance Strategy
+
+**For teams running Kafka 4.0+:**
+
+- Use the new **KIP-848 consumer protocol** (`group.protocol` set to `consumer`) as your primary choice
+- Benefits: Faster rebalancing, less disruption, simpler operation, improved static membership handling
+- This is the recommended approach for all new deployments with compatible infrastructure
+
+**For teams on older Kafka versions or with compatibility constraints:**
+
+- Use **`cooperative-sticky`** rebalance strategy
+- Still provides incremental rebalancing benefits over the older eager protocols
+- Migrate to KIP-848 when your infrastructure supports it
+
+Example configuration for KIP-848:
+
+```ruby
+class KarafkaApp < Karafka::App
+  setup do |config|
+    config.kafka = {
+      'bootstrap.servers': '127.0.0.1:9092',
+      # Use the new consumer protocol (KIP-848)
+      'group.protocol': 'consumer'
+    }
+  end
+end
+```
+
+Example configuration for cooperative-sticky (fallback option):
+
+```ruby
+class KarafkaApp < Karafka::App
+  setup do |config|
+    config.kafka = {
+      'bootstrap.servers': '127.0.0.1:9092',
+      'partition.assignment.strategy': 'cooperative-sticky'
+    }
+  end
+end
+```
 
 ### Liveness
 
diff --git a/Operations/Development-vs-Production.md b/Operations/Development-vs-Production.md
@@ -86,6 +86,7 @@ class KarafkaApp < Karafka::App
   setup do |config|
     config.kafka = {
       'bootstrap.servers': '127.0.0.1:9092',
+      # Fallback to cooperative-sticky for older brokers
       'partition.assignment.strategy': 'cooperative-sticky'
     }
   end
diff --git a/Pro/Long-Running-Jobs.md b/Pro/Long-Running-Jobs.md
@@ -117,6 +117,8 @@ class KarafkaApp < Karafka::App
 end
 ```
 
+Both strategies help avoid unnecessary partition revocations when partitions would be re-assigned back to the same process.
+
 ### Revocation and re-assignment
 
 In the case of scenario `2`, there is nothing you need to do. Karafka will continue processing your messages and resume partition after it is done with the work.

Original file line number	Diff line number	Diff line change
`@@ -86,6 +86,7 @@ class KarafkaApp < Karafka::App`
`86`	`86`	`setup do \|config\|`
`87`	`87`	`config.kafka = {`
`88`	`88`	`'bootstrap.servers': '127.0.0.1:9092',`
	`89`	`+ # Fallback to cooperative-sticky for older brokers`
`89`	`90`	`'partition.assignment.strategy': 'cooperative-sticky'`
`90`	`91`	`}`
`91`	`92`	`end`