-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix clog longer than test duration #12452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
// Clog and wait for recovery to happen | ||
if (!self->clogTlog(self->testDuration)) { | ||
double clogDuration = self->testDuration * (0.5 + 0.4 * deterministicRandom()->random01()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not put this into a method because whether using a shorten clogDuration depends on workload. There are some workloads use clogTlog
which wants to clog a pair almost forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if you have any preference. Thanks! @gxglass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testDuration = 200.0 in DcLag.toml, which doesn't seem to be very long. And after the test finishes, unclogAll()
should remove the clogging.
Can you explain the failure you've seen?
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
When injecting a clog in simulation, the goal is often to apply it before the test ends. However, if the clog duration is set equal to the test duration, the clog may persist after the workload completes, lasting longer than intended.
100K correctness test:
20251014-010706-zhewang-7bedcfbdb57ce5a3 compressed=True data_size=38897360 fail_fast=10 max_runs=100000 priority=100 sanity=False submitted=20251014-010706 timeout=5400 username=zhewang
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)