-
Notifications
You must be signed in to change notification settings - Fork 14.5k
KAFKA-19421: Deflake streams_broker_down_resilience_test #19999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
KAFKA-19421: Deflake streams_broker_down_resilience_test #19999
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes flakiness in streams_broker_down_resilience_test
by producing records with keys (via a new repeating_keys
parameter) to ensure all partitions receive messages.
Key changes:
- Added
repeating_keys
argument toassert_produce
calls in four resilience tests with clarifying comments. - Extended
assert_produce
signature inbase_streams_test.py
to accept and pass throughrepeating_keys
.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
tests/kafkatest/tests/streams/streams_broker_down_resilience_test.py | Added comments and passed repeating_keys=self.num_messages to assert_produce in multiple tests. |
tests/kafkatest/tests/streams/base_streams_test.py | Updated assert_produce signature to include repeating_keys and forwarded it to get_producer . |
Comments suppressed due to low confidence (2)
tests/kafkatest/tests/streams/base_streams_test.py:85
- [nitpick] The parameter name 'repeating_keys' is ambiguous—it sounds like a boolean flag but is used as a numeric key count. Consider renaming it to something more descriptive like 'num_keys' or 'unique_key_count'.
def assert_produce(self, topic, test_state, num_messages=5, timeout_sec=60, repeating_keys=None):
tests/kafkatest/tests/streams/base_streams_test.py:85
- The new parameter 'repeating_keys' is undocumented. Please update the method docstring or add a comment to explain its intended use and expected values.
def assert_produce(self, topic, test_state, num_messages=5, timeout_sec=60, repeating_keys=None):
# repeating_keys enables production of records with keys, ensuring that we produce to all 3 partitions | ||
self.assert_produce(self.inputTopic, | ||
"sending_message_after_broker_down_initially", | ||
num_messages=self.num_messages, | ||
timeout_sec=120) | ||
timeout_sec=120, | ||
repeating_keys=self.num_messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This comment is repeated verbatim across multiple test cases. Consider abstracting the key-production logic into a helper or adding a shared comment to reduce duplication.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. The StreamsBrokerDownResilience
e2e cases can pass.
Out of curiosity, do we want to introduce repeating_keys
to assert_produce_consume
as well?
kafka/tests/kafkatest/tests/streams/base_streams_test.py
Lines 73 to 83 in 659ace8
def assert_produce_consume(self, | |
streams_source_topic, | |
streams_sink_topic, | |
client_id, | |
test_state, | |
num_messages=5, | |
timeout_sec=60): | |
self.assert_produce(streams_source_topic, test_state, num_messages, timeout_sec) | |
self.assert_consume(client_id, test_state, streams_sink_topic, num_messages, timeout_sec) |
Thanks for the comment @FrankYang0529
We could, but it's not necessary, because in that case we only assert that our messages are produced to / consumed from any partition. For |
Here are 40 passing runs (across all parameters) of the test with this PR. Not sure if this is sufficient to trigger the original flake, but at least it should show that I didn't break things: |
PTAL @aliehsaeedii |
streams_broker_down_resilience_test
produce messages withnull
keyto a topic with three partitions and expect each partition to be
non-empty afterward. But I don't think this is a correct assumption, as
a producer may try to be sticky and only produce to two partitions.
This cause occasional flakiness in the test.
The fix is to produce records with keys.