Skip to content

Conversation

kakaiu
Copy link
Member

@kakaiu kakaiu commented Oct 14, 2025

When injecting a clog in simulation, the goal is often to apply it before the test ends. However, if the clog duration is set equal to the test duration, the clog may persist after the workload completes, lasting longer than intended.

100K correctness test:
20251014-010706-zhewang-7bedcfbdb57ce5a3 compressed=True data_size=38897360 fail_fast=10 max_runs=100000 priority=100 sanity=False submitted=20251014-010706 timeout=5400 username=zhewang

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@kakaiu kakaiu requested review from gxglass and jzhou77 October 14, 2025 00:50

// Clog and wait for recovery to happen
if (!self->clogTlog(self->testDuration)) {
double clogDuration = self->testDuration * (0.5 + 0.4 * deterministicRandom()->random01());
Copy link
Member Author

@kakaiu kakaiu Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not put this into a method because whether using a shorten clogDuration depends on workload. There are some workloads use clogTlog which wants to clog a pair almost forever.

Copy link
Member Author

@kakaiu kakaiu Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if you have any preference. Thanks! @gxglass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testDuration = 200.0 in DcLag.toml, which doesn't seem to be very long. And after the test finishes, unclogAll() should remove the clogging.

Can you explain the failure you've seen?

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 52ebf8f
  • Duration 0:11:24
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 52ebf8f
  • Duration 0:30:41
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 52ebf8f
  • Duration 0:39:04
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 52ebf8f
  • Duration 0:45:11
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 52ebf8f
  • Duration 1:06:35
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 52ebf8f
  • Duration 1:20:20
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 52ebf8f
  • Duration 1:28:36
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants