tests/node_ops: avoid interference betwen failure injections and ops #20309

bashtanov · 2024-06-26T18:30:31Z

They can already be separated with a lock, but it's not sufficient. We need certain delay between them.

Fixes #18272

Backports Required

Release Notes

bashtanov · 2024-06-26T18:30:38Z

/dt

bashtanov · 2024-06-26T19:38:21Z

/dt

bashtanov · 2024-06-26T21:21:26Z

/dt

They can already be separated with a lock, but it's not sufficient. We need certain delay between them.

bashtanov · 2024-06-27T10:46:19Z

/dt

bashtanov · 2024-06-27T14:12:46Z

/dt

bashtanov · 2024-06-27T18:14:48Z

test failure unrelated

bharathv · 2024-06-28T23:59:08Z

tests/rptest/utils/node_operations.py

@@ -330,6 +332,7 @@ def stop_node(self, idx):
        with self.lock:
            self.redpanda.remove_from_started_nodes(node)
            self.redpanda.stop_node(node)
+            time.sleep(self.wait_after_stop)


how exactly this fixes the issue? The test failure is a idempotency violation, I thought that would be a code bug in most cases.

The test uses both random failures and node decommissions. The problem was reproduced in the following way:

node 2 was the leader, node 3 was mostly up to date with it, and node 4 was behind

node 2 was restarted, and very shortly after it node 3 got down

node 4 became a candidate and node 2 voted for it because when 2 restarted it could not recover up to the majority_replicated_index point, but rather only till some point below -- is it normal?

so some raft data was lost

mmaslankaprv · 2024-07-01T15:01:34Z

i agree with Bharath that the solution from that PR doesn't fixe the issue mentioned in cover letter

bashtanov force-pushed the test-node-ops-failures-no-interfere branch from b8c0188 to 18b2a4e Compare June 26, 2024 18:34

bashtanov force-pushed the test-node-ops-failures-no-interfere branch from 18b2a4e to 3d28155 Compare June 26, 2024 19:45

tests/node_ops: avoid interference betwen failure injections and ops

308667f

They can already be separated with a lock, but it's not sufficient. We need certain delay between them.

bashtanov force-pushed the test-node-ops-failures-no-interfere branch from 3d28155 to 308667f Compare June 27, 2024 10:10

bashtanov requested review from mmaslankaprv, bharathv and ztlpn June 27, 2024 18:14

bashtanov marked this pull request as ready for review June 28, 2024 15:11

bharathv reviewed Jun 28, 2024

View reviewed changes

bashtanov closed this Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/node_ops: avoid interference betwen failure injections and ops #20309

tests/node_ops: avoid interference betwen failure injections and ops #20309

bashtanov commented Jun 26, 2024 •

edited

Loading

bashtanov commented Jun 26, 2024

bashtanov commented Jun 26, 2024

bashtanov commented Jun 26, 2024

bashtanov commented Jun 27, 2024

bashtanov commented Jun 27, 2024

bashtanov commented Jun 27, 2024

bharathv Jun 28, 2024

bashtanov Jul 1, 2024

mmaslankaprv commented Jul 1, 2024

tests/node_ops: avoid interference betwen failure injections and ops #20309

tests/node_ops: avoid interference betwen failure injections and ops #20309

Conversation

bashtanov commented Jun 26, 2024 • edited Loading

Backports Required

Release Notes

bashtanov commented Jun 26, 2024

bashtanov commented Jun 26, 2024

bashtanov commented Jun 26, 2024

bashtanov commented Jun 27, 2024

bashtanov commented Jun 27, 2024

bashtanov commented Jun 27, 2024

bharathv Jun 28, 2024

Choose a reason for hiding this comment

bashtanov Jul 1, 2024

Choose a reason for hiding this comment

mmaslankaprv commented Jul 1, 2024

bashtanov commented Jun 26, 2024 •

edited

Loading