[DEFLAKE] Psync established after rdb load - beyond grace period #2748

roshkhatri · 2025-10-17T22:24:38Z

Resolves: #2695

We saw the same error once in #2694 where the rdb took a little longer to load and thats where the test failed.

I ran it for 100 time in a workflow here https://github.com/roshkhatri/valkey/actions/runs/18605561661/job/53054114207 where I see max of 5.5 seconds to load the RDB.

Here I have increased the check for total of 6 seconds and IMO we would see it failing again, unless the rdb load get even slower in valgrid.

Signed-off-by: Roshan Khatri <[email protected]>

tests/integration/dual-channel-replication.tcl

Co-authored-by: Harkrishn Patro <[email protected]> Signed-off-by: Roshan Khatri <[email protected]>

codecov · 2025-10-18T20:35:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.65%. Comparing base (b4c93cc) to head (8d7ddb9).
⚠️ Report is 1 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2748      +/-   ##
============================================
+ Coverage     72.59%   72.65%   +0.05%     
============================================
  Files           128      128              
  Lines         71301    71300       -1     
============================================
+ Hits          51759    51800      +41     
+ Misses        19542    19500      -42

see 17 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast · 2025-10-20T13:50:30Z

tests/integration/dual-channel-replication.tcl

            # Expected outcome: Primary drops the RDB channel after grace period is done.
            $replica replicaof $primary_host $primary_port
-            wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 1
+            wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 5


The syntax of this is proc is

proc wait_for_log_messages {srv_idx patterns from_line maxtries delay}

Reading the file every 5 milliseconds and repeating 2000 times seems like a quite heavy operation. Can we change these to a delay of 100 or 50 like in other wait-for expressions?

Suggested change

wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 5

wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 100 100

Yeah, we can so 100 100 and I dont think there should be an issue, we could do the same with the previous test Psync established after rdb load - within grace period

Yeah, I see there are multiple (seven?) with 2000 1 in test suite, one with 4000 1, one with 20000 1 and eight(?) with 1000 10.

Let's change them all to use a delay of 100 while keeping the same total time?

sarthakaggarwal97 · 2025-10-23T16:36:01Z

@roshkhatri were we able to reproduce the test failure?

increase the check for 6 seconds

3406eca

Signed-off-by: Roshan Khatri <[email protected]>

roshkhatri self-assigned this Oct 17, 2025

roshkhatri requested a review from hpatro October 17, 2025 22:25

hpatro reviewed Oct 17, 2025

View reviewed changes

tests/integration/dual-channel-replication.tcl Outdated Show resolved Hide resolved

Update tests/integration/dual-channel-replication.tcl

8d7ddb9

Co-authored-by: Harkrishn Patro <[email protected]> Signed-off-by: Roshan Khatri <[email protected]>

zuiderkwast reviewed Oct 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DEFLAKE] Psync established after rdb load - beyond grace period #2748

[DEFLAKE] Psync established after rdb load - beyond grace period #2748

roshkhatri commented Oct 17, 2025

Uh oh!

Uh oh!

codecov bot commented Oct 18, 2025

Uh oh!

zuiderkwast Oct 20, 2025

Uh oh!

roshkhatri Oct 20, 2025 •

edited

Loading

Uh oh!

zuiderkwast Oct 20, 2025

Uh oh!

sarthakaggarwal97 commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	wait_for_log_messages 0 {"Done loading RDB"} $loglines 2000 5
	wait_for_log_messages 0 {"Done loading RDB"} $loglines 100 100

Uh oh!

[DEFLAKE] Psync established after rdb load - beyond grace period #2748

Are you sure you want to change the base?

[DEFLAKE] Psync established after rdb load - beyond grace period #2748

Conversation

roshkhatri commented Oct 17, 2025

Uh oh!

Uh oh!

codecov bot commented Oct 18, 2025

Codecov Report

Uh oh!

zuiderkwast Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

roshkhatri Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

sarthakaggarwal97 commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roshkhatri Oct 20, 2025 •

edited

Loading