-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
beacon-client: attestation data request caching #1798
beacon-client: attestation data request caching #1798
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd suggest minor improvements
bbca7ed
to
169cd69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! 👍
I'll approve it once we run CI for this PR and make sure it passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good! great job!
@moshe-blox wdyt about this? end of the day we might be missing if there's newer data.. but is it worth it to request every time if we already waited the time appropriate to have correct data?
beacon/goclient/attest.go
Outdated
// Check cache. | ||
cachedResult := gc.attestationDataCache.Get(slot) | ||
if cachedResult != nil { | ||
return withCommitteeIndex(cachedResult.Value(), committeeIndex), spec.DataVersionPhase0, nil | ||
} | ||
if resp == nil { | ||
return nil, DataVersionNil, fmt.Errorf("attestation data response is nil") | ||
|
||
// Have to make beacon node request and cache the result. | ||
result, err := func() (*phase0.AttestationData, error) { | ||
// Requests with the same slot number must lock on the same mutex. | ||
reqMu := &gc.attestationReqMuPool[uint64(slot)%uint64(len(gc.attestationReqMuPool))] | ||
reqMu.Lock() | ||
defer reqMu.Unlock() | ||
|
||
// Prevent making more than 1 beacon node requests in case somebody has already made this | ||
// request concurrently and succeeded. | ||
cachedResult := gc.attestationDataCache.Get(slot) | ||
if cachedResult != nil { | ||
return cachedResult.Value(), nil | ||
} | ||
|
||
attDataReqStart := time.Now() | ||
resp, err := gc.client.AttestationData(gc.ctx, &api.AttestationDataOpts{ | ||
Slot: slot, | ||
}) | ||
metricsAttesterDataRequest.Observe(time.Since(attDataReqStart).Seconds()) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to get attestation data: %w", err) | ||
} | ||
if resp == nil { | ||
return nil, fmt.Errorf("attestation data response is nil") | ||
} | ||
|
||
gc.attestationDataCache.Set(slot, resp.Data, ttlcache.DefaultTTL) | ||
|
||
return resp.Data, nil | ||
}() | ||
if err != nil { | ||
return nil, DataVersionNil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should work and seems to address the various edgecases 👌
with that said, i think this might be simplified using https://pkg.go.dev/golang.org/x/sync/singleflight instead of attestationReqMuPool
i'm not 100% certain it will be worth it over our manual implementation, but i think we should try and compare both approaches :)
WDYT?
edit: i just found this which adds generics, from a reputable repo (from one of the former Go team members): https://pkg.go.dev/tailscale.com/util/singleflight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for something like this ( turns out not hard enough :) ), thanks for bringing it up!
Although I was trying to find a package that handles both aspects: inflight + caching. Still, https://pkg.go.dev/tailscale.com/util/singleflight does simplify this code a bit (and unit tests don't complain) - see fcaa83d for how it's different compared to the code I had before.
From performance perspective, I don't think it matters much for this particular case, but if we were to push it to the limits - I think each of these 2 approaches will have it's preferred type of workload while doing worse in some circumstances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
the readability gain is worth the performance cost here IMO
(i don't think performance cost is noticeable below many thousands of calls per second)
997f01a
to
fcaa83d
Compare
go.mod
Outdated
go 1.22 | ||
go 1.22.0 | ||
|
||
toolchain go1.23.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is the result of running go mod tidy
after adding tailscale.com/util/singleflight
(note also, I'm not using the latest version of this package because it will require to bump the go 1.22
line up to go 1.23.1
),
I'm new to this toolchain
concept, from what I've gathered it's used as "recommended" version (it will be downloaded and used in case you are trying to build with lower version).
Building Docker image with golang:1.22
seems to be working fine with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed toolchain
directive, see #1840 (comment) for details.
Tested on stage, seems to be working fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CI passes for the last commit: https://github.com/ssvlabs/ssv/actions?query=branch%3Atest%2Fbeacon-client-attestations-cache |
c3fd691
to
5a56db9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🔥
i've made some changes, we should test this again before merge
…only fetched during the slot of the duty and not later
5d8f9fb
to
241a4b7
Compare
This PR have been running on stage for several hours now without obvious issues (as far as I can tell). |
* beacon-client: attestation data request caching * first review pass * add unit-tests for GetAttestationData * 2nd review pass * minor cleanup * gotta start ttlcache to prune expired items * check total number of slots requested * use singleflight package to hide low-level implementation details * fix linter * err must be func-local to not cause races * omit toolchain directive * lock Go 1.22.6 * deep copy AttestationData * reduce attestation data cache TTL to 2 slots because in practice its only fetched during the slot of the duty and not later * comment * comment * comment * comment * add clarifying comment about cached items lifetime --------- Co-authored-by: moshe-blox <[email protected]>
* beacon-client: attestation data request caching * first review pass * add unit-tests for GetAttestationData * 2nd review pass * minor cleanup * gotta start ttlcache to prune expired items * check total number of slots requested * use singleflight package to hide low-level implementation details * fix linter * err must be func-local to not cause races * omit toolchain directive * lock Go 1.22.6 * deep copy AttestationData * reduce attestation data cache TTL to 2 slots because in practice its only fetched during the slot of the duty and not later * comment * comment * comment * comment * add clarifying comment about cached items lifetime --------- Co-authored-by: moshe-blox <[email protected]>
Closes #1472
Technically, caching solution implemented here is general purpose (in case we need similar caching functionality somewhere else), it could be extracted into separate "util-like" package (for now I guess there is no need for it).
Note, I've also seen/reviewed #1367 - the approach I've taken is pretty similar but tackles some concurrency nuances in slightly different way.
Note, PR is ready for review (at least to see if it's conceptually fine), but I'll need to: