-
Couldn't load subscription status.
- Fork 928
[DEFLAKE] Psync established after rdb load - beyond grace period #2748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
Signed-off-by: Roshan Khatri <[email protected]>
Co-authored-by: Harkrishn Patro <[email protected]> Signed-off-by: Roshan Khatri <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2748 +/- ##
============================================
+ Coverage 72.59% 72.65% +0.05%
============================================
Files 128 128
Lines 71301 71300 -1
============================================
+ Hits 51759 51800 +41
+ Misses 19542 19500 -42 🚀 New features to boost your workflow:
|
| # Expected outcome: Primary drops the RDB channel after grace period is done. | ||
| $replica replicaof $primary_host $primary_port | ||
| wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 1 | ||
| wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax of this is proc is
proc wait_for_log_messages {srv_idx patterns from_line maxtries delay}
Reading the file every 5 milliseconds and repeating 2000 times seems like a quite heavy operation. Can we change these to a delay of 100 or 50 like in other wait-for expressions?
| wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 2000 5 | |
| wait_for_log_messages 0 {"*Done loading RDB*"} $loglines 100 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can so 100 100 and I dont think there should be an issue, we could do the same with the previous test Psync established after rdb load - within grace period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see there are multiple (seven?) with 2000 1 in test suite, one with 4000 1, one with 20000 1 and eight(?) with 1000 10.
Let's change them all to use a delay of 100 while keeping the same total time?
|
@roshkhatri were we able to reproduce the test failure? |
Resolves: #2695
We saw the same error once in #2694 where the rdb took a little longer to load and thats where the test failed.
I ran it for 100 time in a workflow here https://github.com/roshkhatri/valkey/actions/runs/18605561661/job/53054114207 where I see max of 5.5 seconds to load the RDB.
Here I have increased the check for total of 6 seconds and IMO we would see it failing again, unless the rdb load get even slower in valgrid.