Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slotticker: improve and clarify #1860

Open
wants to merge 6 commits into
base: stage
Choose a base branch
from
Open

Conversation

iurii-ssv
Copy link
Contributor

@iurii-ssv iurii-ssv commented Nov 19, 2024

This PR:

  • updates and clarifies SlotTicker docs (eg. calls out certain assumptions such as the need to call Next method periodically to "keep ticking" - which is different from how one would expect Golang ticker to operate, for example)
  • strives to fix or clarify certain edge-cases (see open discussions below)

Before merging:

  • test on stage

Comment on lines 278 to 288
if waitDuration > 0 {
time.Sleep(waitDuration)

// Lock the mutex before broadcasting
s.waitCond.L.Lock()
s.headSlot = slot
s.waitCond.Broadcast()
s.waitCond.L.Unlock()
}

// Lock the mutex before broadcasting
s.waitCond.L.Lock()
s.headSlot = slot
s.waitCond.Broadcast()
s.waitCond.L.Unlock()
Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should advance headSlot regardless of whether or not we actually had to wait to 1/3 here (that is, if tick happened past 1/3 of slot start time - we still want to advance it, right) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iurii-ssv The only idea that comes to my mind as to why this could have been done like that is to make sure the slot is not decreased

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iurii-ssv The only idea that comes to my mind as to why this could have been done like that is to make sure the slot is not decreased

Not sure if that's even possible for "slot decreasing" to happen in the current code version (or how that time-based if would even help with that), but I'm adding another change here that actually does make it a likely problem to happen,

and hence we probably want to have explicit protection against "slot decreasing" (which I've now added in this PR)

Comment on lines 48 to 75
func newWithCustomTimer(logger *zap.Logger, cfg Config, timerProvider TimerProvider) *slotTicker {
now := time.Now()
timeSinceGenesis := now.Sub(cfg.GenesisTime)
timeSinceGenesis := time.Since(cfg.GenesisTime)

var initialDelay time.Duration
var (
initialDelay time.Duration
initialSlot phase0.Slot
)
if timeSinceGenesis < 0 {
// Genesis time is in the future
initialDelay = -timeSinceGenesis // Wait until the genesis time
initialSlot = phase0.Slot(0) // Start at slot 0
} else {
slotsSinceGenesis := timeSinceGenesis / cfg.SlotDuration
nextSlotStartTime := cfg.GenesisTime.Add((slotsSinceGenesis + 1) * cfg.SlotDuration)
initialDelay = time.Until(nextSlotStartTime)
initialSlot = phase0.Slot(slotsSinceGenesis)
}

return &slotTicker{
logger: logger,
timer: timerProvider(initialDelay),
slotDuration: cfg.SlotDuration,
genesisTime: cfg.GenesisTime,
slot: 0,
slot: initialSlot,
}
}
Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like here we should start at some slot that corresponds to cfg.GenesisTime (when it's in the past) instead of at slot 0 ?

I'm guessing this didn't cause any issues because we always called Next before calling Slot on brand-new slotTicker - which was documented in this comment:

// Slot returns the current slot number.
// Note: Like the Next function, this method is also not thread-safe.
// It should be called in a serialized manner after calling Next.
func (s *slotTicker) Slot() phase0.Slot {

but it's probably better not to have such assumptions and start ticker with proper initialSlot right away (which is what I'm aiming for here).

Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this didn't cause any issues because we always called Next before calling Slot on brand-new slotTicker

note btw, this ^ also means we never actually "tick" (handle duties) for the very first initialSlot - but rather wait until the next slot that comes after it,

at the worst case we essentially skip 1 full slot after node start, not a big deal probably - but maybe now we can address this, wdyt ?

operator/slotticker/slotticker.go Outdated Show resolved Hide resolved
Comment on lines 278 to 288
if waitDuration > 0 {
time.Sleep(waitDuration)

// Lock the mutex before broadcasting
s.waitCond.L.Lock()
s.headSlot = slot
s.waitCond.Broadcast()
s.waitCond.L.Unlock()
}

// Lock the mutex before broadcasting
s.waitCond.L.Lock()
s.headSlot = slot
s.waitCond.Broadcast()
s.waitCond.L.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iurii-ssv The only idea that comes to my mind as to why this could have been done like that is to make sure the slot is not decreased

Comment on lines 302 to 309
data := event.Data.(*eth2apiv1.HeadEvent)
if data.Slot != s.network.Beacon.EstimatedCurrentSlot() {

// we are interested only in events for the current slot, but to account for wall-clock
// differences the next slot after "what we think is the current one" can also be valid
if data.Slot != s.network.Beacon.EstimatedCurrentSlot() &&
data.Slot != s.network.Beacon.EstimatedCurrentSlot()+1 {
return
}
Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs a slight adjustment (like I suggest in this PR) because "current slot" is based on our own "local current time" which might slightly lag behind "true current time" (there is no such thing as true time in distributed systems really, but what I mean is somebody sends us an event slightly earlier than we expect it and considers this OK to do because "his clock is running early compared to ours").

Not sure how likely it is to happen in practice though - still better safe than sorry.

@iurii-ssv iurii-ssv force-pushed the slotticker-improve-clarify branch from b4d70ac to f5ec690 Compare November 20, 2024 18:47
@iurii-ssv iurii-ssv force-pushed the slotticker-improve-clarify branch from 4584d43 to 6c379d0 Compare November 21, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants