-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/node_ops: avoid interference betwen failure injections and ops #20309
tests/node_ops: avoid interference betwen failure injections and ops #20309
Conversation
/dt |
b8c0188
to
18b2a4e
Compare
/dt |
18b2a4e
to
3d28155
Compare
/dt |
They can already be separated with a lock, but it's not sufficient. We need certain delay between them.
3d28155
to
308667f
Compare
/dt |
1 similar comment
/dt |
@@ -330,6 +332,7 @@ def stop_node(self, idx): | |||
with self.lock: | |||
self.redpanda.remove_from_started_nodes(node) | |||
self.redpanda.stop_node(node) | |||
time.sleep(self.wait_after_stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how exactly this fixes the issue? The test failure is a idempotency violation, I thought that would be a code bug in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test uses both random failures and node decommissions. The problem was reproduced in the following way:
- node 2 was the leader, node 3 was mostly up to date with it, and node 4 was behind
- node 2 was restarted, and very shortly after it node 3 got down
- node 4 became a candidate and node 2 voted for it because when 2 restarted it could not recover up to the
majority_replicated_index
point, but rather only till some point below -- is it normal? - so some raft data was lost
i agree with Bharath that the solution from that PR doesn't fixe the issue mentioned in cover letter |
They can already be separated with a lock, but it's not sufficient. We need certain delay between them.
Fixes #18272
Backports Required
Release Notes