-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PESDLC-1489 StreamVerifier service #19976
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is marked as draft, so unsure if this is going in the current form but please see my comments.
cda563c
to
e09ac04
Compare
Most recent commit provides visibility on how very basic test structure might look like. One more thing to add is message validation option on consumption via externally pluggable validator. |
12fa904
to
e1d15ac
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/51238#019094e8-d681-417f-a7c3-8db584f7d721:
new failures in https://buildkite.com/redpanda/redpanda/builds/51587#0190bc79-b124-448a-a387-9c4f087b5635:
new failures in https://buildkite.com/redpanda/redpanda/builds/51658#0190c184-0228-4a99-9546-e8430d8b45dd:
new failures in https://buildkite.com/redpanda/redpanda/builds/51658#0190c184-fa72-4f55-bca5-85c2cdc49e69:
|
fc8c811
to
d7efb91
Compare
@bharathv , updated to fit simplification. Will continue on Monday to debug and polish.
Most probably due to lack of message retry to produce after KafkaErrors on node unavailability. |
skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51451#0190a94f-5ebe-47b9-baed-e16504ea0f32: skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51451#0190a965-6c5a-43db-9d72-64e6387d099f: |
c43066b
to
cf47568
Compare
Sample test runs using EC2
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51539#0190b813-8c1a-4c0d-b167-1183571c629f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51587#0190bc7b-44ac-4632-903e-0d5b8b8b747c ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51658#0190c184-022c-4ee8-8b50-5a9c9ccd5a2d ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51658#0190c184-fa77-428e-8ad2-74bebbe97ffa ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51677#0190c229-84b8-4a71-a064-26caba5d1a5c ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51799#0190ccbe-f206-445d-802f-af11e40346d8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bunch of nits.
|
||
class ConsistencyViolationException(Exception): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, missed that. Thanks.
self.logger.debug(f"[{action}] ...{count} messages") | ||
return count | ||
|
||
def ensure_progress(self, action, delta, timeout_sec): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these tests yes, but in case of using service for some node operations is useful. I would leave it.
def get_produce_status(self): | ||
return self._get(self._get_url("produce"), raise_on_fails=False) | ||
|
||
def get_consume_status(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on naming, make it consistent with "verify" semantics in the other class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will do
self.target_topic = "stream_topic_dst" | ||
|
||
# Calculated speed of producer is ~100k messages per minute, which is ~1500/sec | ||
# Amtomic thread processed ~250 messages per second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
self.verifier.remote_start_produce(self.source_topic, | ||
self.default_message_count, | ||
messages_per_sec=messages_per_sec) | ||
self.logger.info(f"Waiting for {wait_msg_count} produces messages") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing a call to ensure progress or am I misreading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@savex Failure seems related. |
@bharathv, yes, I keep forgetting that CDT runs in docker and there should be different set of timings |
3bc1cc1
to
a2db278
Compare
/cdt |
a2db278
to
d0b7563
Compare
/cdt |
Based on remote script that has stream verifier webservice implemented, this servie provides easy to use interface for produce, atomic consume/produce and consume actions. Basic service workflow is verifier.start verifier.update_service_config verifier.remote_start_produce verifier.remote_start_atomic verifier.remote_wait_action('produce') verifier.remote_wait_action('atomic') verifier.remote_start_consume verifier.remote_wait_action('consume') And use verifier.get_produce_status to check on offsets and/or message count
Test checks that produce/atomic/consume action can handle broker addition safely. Also, some internal unified routines implemented to make test itself look simpler. Fixed typos and remove unused exception Updated message count for non-dedicated nodes env
d0b7563
to
5aafe8c
Compare
This implements service part of the StreamVerifier that is created in this PR.
Service will supports all commands including 'produce', 'consume' and 'atomic' as well as getting current status and active command
Backports Required
Release Notes