-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slotticker: improve and clarify #1860
base: stage
Are you sure you want to change the base?
Conversation
operator/duties/scheduler.go
Outdated
if waitDuration > 0 { | ||
time.Sleep(waitDuration) | ||
|
||
// Lock the mutex before broadcasting | ||
s.waitCond.L.Lock() | ||
s.headSlot = slot | ||
s.waitCond.Broadcast() | ||
s.waitCond.L.Unlock() | ||
} | ||
|
||
// Lock the mutex before broadcasting | ||
s.waitCond.L.Lock() | ||
s.headSlot = slot | ||
s.waitCond.Broadcast() | ||
s.waitCond.L.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should advance headSlot
regardless of whether or not we actually had to wait to 1/3 here (that is, if tick happened past 1/3 of slot start time - we still want to advance it, right) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iurii-ssv The only idea that comes to my mind as to why this could have been done like that is to make sure the slot is not decreased
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iurii-ssv The only idea that comes to my mind as to why this could have been done like that is to make sure the slot is not decreased
Not sure if that's even possible for "slot decreasing" to happen in the current code version (or how that time-based if
would even help with that), but I'm adding another change here that actually does make it a likely problem to happen,
and hence we probably want to have explicit protection against "slot decreasing" (which I've now added in this PR)
operator/slotticker/slotticker.go
Outdated
func newWithCustomTimer(logger *zap.Logger, cfg Config, timerProvider TimerProvider) *slotTicker { | ||
now := time.Now() | ||
timeSinceGenesis := now.Sub(cfg.GenesisTime) | ||
timeSinceGenesis := time.Since(cfg.GenesisTime) | ||
|
||
var initialDelay time.Duration | ||
var ( | ||
initialDelay time.Duration | ||
initialSlot phase0.Slot | ||
) | ||
if timeSinceGenesis < 0 { | ||
// Genesis time is in the future | ||
initialDelay = -timeSinceGenesis // Wait until the genesis time | ||
initialSlot = phase0.Slot(0) // Start at slot 0 | ||
} else { | ||
slotsSinceGenesis := timeSinceGenesis / cfg.SlotDuration | ||
nextSlotStartTime := cfg.GenesisTime.Add((slotsSinceGenesis + 1) * cfg.SlotDuration) | ||
initialDelay = time.Until(nextSlotStartTime) | ||
initialSlot = phase0.Slot(slotsSinceGenesis) | ||
} | ||
|
||
return &slotTicker{ | ||
logger: logger, | ||
timer: timerProvider(initialDelay), | ||
slotDuration: cfg.SlotDuration, | ||
genesisTime: cfg.GenesisTime, | ||
slot: 0, | ||
slot: initialSlot, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like here we should start at some slot that corresponds to cfg.GenesisTime
(when it's in the past) instead of at slot 0 ?
I'm guessing this didn't cause any issues because we always called Next
before calling Slot
on brand-new slotTicker
- which was documented in this comment:
// Slot returns the current slot number.
// Note: Like the Next function, this method is also not thread-safe.
// It should be called in a serialized manner after calling Next.
func (s *slotTicker) Slot() phase0.Slot {
but it's probably better not to have such assumptions and start ticker with proper initialSlot
right away (which is what I'm aiming for here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this didn't cause any issues because we always called
Next
before callingSlot
on brand-newslotTicker
note btw, this ^ also means we never actually "tick" (handle duties) for the very first initialSlot
- but rather wait until the next slot that comes after it,
at the worst case we essentially skip 1 full slot after node start, not a big deal probably - but maybe now we can address this, wdyt ?
operator/duties/scheduler.go
Outdated
if waitDuration > 0 { | ||
time.Sleep(waitDuration) | ||
|
||
// Lock the mutex before broadcasting | ||
s.waitCond.L.Lock() | ||
s.headSlot = slot | ||
s.waitCond.Broadcast() | ||
s.waitCond.L.Unlock() | ||
} | ||
|
||
// Lock the mutex before broadcasting | ||
s.waitCond.L.Lock() | ||
s.headSlot = slot | ||
s.waitCond.Broadcast() | ||
s.waitCond.L.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iurii-ssv The only idea that comes to my mind as to why this could have been done like that is to make sure the slot is not decreased
data := event.Data.(*eth2apiv1.HeadEvent) | ||
if data.Slot != s.network.Beacon.EstimatedCurrentSlot() { | ||
|
||
// we are interested only in events for the current slot, but to account for wall-clock | ||
// differences the next slot after "what we think is the current one" can also be valid | ||
if data.Slot != s.network.Beacon.EstimatedCurrentSlot() && | ||
data.Slot != s.network.Beacon.EstimatedCurrentSlot()+1 { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs a slight adjustment (like I suggest in this PR) because "current slot" is based on our own "local current time" which might slightly lag behind "true current time" (there is no such thing as true time in distributed systems really, but what I mean is somebody sends us an event slightly earlier than we expect it and considers this OK to do because "his clock is running early compared to ours").
Not sure how likely it is to happen in practice though - still better safe than sorry.
b4d70ac
to
f5ec690
Compare
4584d43
to
6c379d0
Compare
This PR:
SlotTicker
docs (eg. calls out certain assumptions such as the need to callNext
method periodically to "keep ticking" - which is different from how one would expect Golang ticker to operate, for example)Before merging: