Issue : fetch.max.wait.ms setting of kafka connect not working as expected with CH sink connector #400
-
Broker is running with below configuration socket.send.buffer.bytes=102400 The broker is configured to return up to 400 MB for a single fetch request. Kafka Connect ClickHouse sink connector running with the configuration below Our goal is to fetch and poll more records, then insert a large batch of records into ClickHouse.
But according to the above log, the CH Sink connector is inserting for very few records and making calls to ClickHouse every second. But if we update the connector config to include the parameter consumer.override.fetch.min.bytes: 109715200, then the CH Sink connector waits and inserts 400k records with every push. We have the following questions:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @rapsvenkat! So fetch.max.wait.ms works in conjunction with fetch.min.bytes Kafka Connect will keep waiting for the minimum bytes you specify (fetch.min.bytes) until the timeout is reached (fetch.max.wait.ms) - if you wanted it to be a more regular cadence, you could try setting fetch.min.bytes to some comically high value (then it would always trigger via timeout) If you ONLY set fetch.min.bytes it would still trigger every 500ms (the default value of fetch.max.wait.ms), and if you ONLY set fetch.max.wait.ms it would trigger as soon as data shows up (the default value of fetch.min.bytes is 1 byte). This is all out-of-the-box functionality of Kafka Connect, so we don't have special settings around batch sizes 🙂 Hope that helps! |
Beta Was this translation helpful? Give feedback.
Hi @rapsvenkat!
So fetch.max.wait.ms works in conjunction with fetch.min.bytes
Kafka Connect will keep waiting for the minimum bytes you specify (fetch.min.bytes) until the timeout is reached (fetch.max.wait.ms) - if you wanted it to be a more regular cadence, you could try setting fetch.min.bytes to some comically high value (then it would always trigger via timeout)
If you ONLY set fetch.min.bytes it would still trigger every 500ms (the default value of fetch.max.wait.ms), and if you ONLY set fetch.max.wait.ms it would trigger as soon as data shows up (the default value of fetch.min.bytes is 1 byte).
This is all out-of-the-box functionality of Kafka Connect, so we don't have special set…