Skip to content

Conversation

@roshkhatri
Copy link
Member

Resolves: #2695

We saw the same error once in #2694 where the rdb took a little longer to load and thats where the test failed.

I ran it for 100 time in a workflow here https://github.com/roshkhatri/valkey/actions/runs/18605561661/job/53054114207 where I see max of 5.5 seconds to load the RDB.

Here I have increased the check for total of 6 seconds and IMO we would see it failing again, unless the rdb load get even slower in valgrid.

@roshkhatri roshkhatri self-assigned this Oct 17, 2025
@roshkhatri roshkhatri requested a review from hpatro October 17, 2025 22:25
Co-authored-by: Harkrishn Patro <[email protected]>
Signed-off-by: Roshan Khatri <[email protected]>
@codecov
Copy link

codecov bot commented Oct 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.65%. Comparing base (b4c93cc) to head (8d7ddb9).
⚠️ Report is 1 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2748      +/-   ##
============================================
+ Coverage     72.59%   72.65%   +0.05%     
============================================
  Files           128      128              
  Lines         71301    71300       -1     
============================================
+ Hits          51759    51800      +41     
+ Misses        19542    19500      -42     

see 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

# Expected outcome: Primary drops the RDB channel after grace period is done.
$replica replicaof $primary_host $primary_port
wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 1
wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax of this is proc is

proc wait_for_log_messages {srv_idx patterns from_line maxtries delay}

Reading the file every 5 milliseconds and repeating 2000 times seems like a quite heavy operation. Can we change these to a delay of 100 or 50 like in other wait-for expressions?

Suggested change
wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 5
wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 100 100

Copy link
Member Author

@roshkhatri roshkhatri Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can so 100 100 and I dont think there should be an issue, we could do the same with the previous test Psync established after rdb load - within grace period

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see there are multiple (seven?) with 2000 1 in test suite, one with 4000 1, one with 20000 1 and eight(?) with 1000 10.

Let's change them all to use a delay of 100 while keeping the same total time?

@sarthakaggarwal97
Copy link
Contributor

@roshkhatri were we able to reproduce the test failure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TEST-FAILURE] Psync established after rdb load - beyond grace period

4 participants