Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (possible idempotency bug) in RandomNodeOperationsTest.test_node_operations #18272

Open
vbotbuildovich opened this issue May 6, 2024 · 1 comment · May be fixed by #20309
Open
Assignees
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/redpanda CI Root Cause Analysis - Redpanda Issue

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 6, 2024

https://buildkite.com/redpanda/redpanda/builds/48738

Module: rptest.tests.random_node_operations_test
Class: RandomNodeOperationsTest
Method: test_node_operations
Arguments: {
    "num_to_upgrade": 0,
    "enable_failures": true,
    "with_tiered_storage": true
}
test_id:    RandomNodeOperationsTest.test_node_operations
status:     FAIL
run time:   402.228 seconds

RuntimeError('KgoVerifierProducer-3-281472309991872 possible idempotency bug: ProduceStatus<409747 409600 147 1 0 0 38841.5/77800.5/108291.75>')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 103, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/random_node_operations_test.py", line 486, in test_node_operations
    write_caching_producer_consumer.verify()
  File "/root/tests/rptest/tests/random_node_operations_test.py", line 250, in verify
    self.producer.wait()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/services/service.py", line 287, in wait
    if not self.wait_node(node, end - now):
  File "/root/tests/rptest/services/kgo_verifier_services.py", line 601, in wait_node
    raise RuntimeError(
RuntimeError: KgoVerifierProducer-3-281472309991872 possible idempotency bug: ProduceStatus<409747 409600 147 1 0 0 38841.5/77800.5/108291.75>

JIRA Link: CORE-2804

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels May 6, 2024
@piyushredpanda piyushredpanda added the ci-rca/redpanda CI Root Cause Analysis - Redpanda Issue label May 9, 2024
@travisdowns travisdowns changed the title CI Failure (key symptom) in RandomNodeOperationsTest.test_node_operations CI Failure (possible idempotency bug) in RandomNodeOperationsTest.test_node_operations Jun 23, 2024
@bashtanov
Copy link
Contributor

The problem here is that node actions, decommissioning in particular, interfere with failure injections. They don't happen exactly at the same time, but close enough (within the same second) to cause partition a raft failure. Similar to how failure injections are spread apart, we should include a sleep in the decommissioning function, so that it holds the lock shared between ops and failure injections after the op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/redpanda CI Root Cause Analysis - Redpanda Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants