beacon-client: attestation data request caching #1798

iurii-ssv · 2024-10-16T10:43:08Z

Technically, caching solution implemented here is general purpose (in case we need similar caching functionality somewhere else), it could be extracted into separate "util-like" package (for now I guess there is no need for it).

Note, I've also seen/reviewed #1367 - the approach I've taken is pretty similar but tackles some concurrency nuances in slightly different way.

Note, PR is ready for review (at least to see if it's conceptually fine), but I'll need to:

add unit tests
test this on stage

beacon/goclient/attest.go

protocol/v2/ssv/runner/committee.go

protocol/v2/blockchain/beacon/duty_data.go

beacon/goclient/attest.go

beacon/goclient/goclient.go

nkryuchkov

LGTM, but I'd suggest minor improvements

beacon/goclient/attest.go

beacon/goclient/goclient.go

beacon/goclient/attest.go

beacon/goclient/goclient.go

beacon/goclient/attest.go

utils/hashmap/hashmap.go

nkryuchkov

Great job! 👍

I'll approve it once we run CI for this PR and make sure it passes

y0sher

Code looks good! great job!

@moshe-blox wdyt about this? end of the day we might be missing if there's newer data.. but is it worth it to request every time if we already waited the time appropriate to have correct data?

moshe-blox · 2024-10-21T12:38:45Z

beacon/goclient/attest.go

+	// Check cache.
+	cachedResult := gc.attestationDataCache.Get(slot)
+	if cachedResult != nil {
+		return withCommitteeIndex(cachedResult.Value(), committeeIndex), spec.DataVersionPhase0, nil
 	}
-	if resp == nil {
-		return nil, DataVersionNil, fmt.Errorf("attestation data response is nil")
+
+	// Have to make beacon node request and cache the result.
+	result, err := func() (*phase0.AttestationData, error) {
+		// Requests with the same slot number must lock on the same mutex.
+		reqMu := &gc.attestationReqMuPool[uint64(slot)%uint64(len(gc.attestationReqMuPool))]
+		reqMu.Lock()
+		defer reqMu.Unlock()
+
+		// Prevent making more than 1 beacon node requests in case somebody has already made this
+		// request concurrently and succeeded.
+		cachedResult := gc.attestationDataCache.Get(slot)
+		if cachedResult != nil {
+			return cachedResult.Value(), nil
+		}
+
+		attDataReqStart := time.Now()
+		resp, err := gc.client.AttestationData(gc.ctx, &api.AttestationDataOpts{
+			Slot: slot,
+		})
+		metricsAttesterDataRequest.Observe(time.Since(attDataReqStart).Seconds())
+		if err != nil {
+			return nil, fmt.Errorf("failed to get attestation data: %w", err)
+		}
+		if resp == nil {
+			return nil, fmt.Errorf("attestation data response is nil")
+		}
+
+		gc.attestationDataCache.Set(slot, resp.Data, ttlcache.DefaultTTL)
+
+		return resp.Data, nil
+	}()
+	if err != nil {
+		return nil, DataVersionNil, err
 	}


this should work and seems to address the various edgecases 👌

with that said, i think this might be simplified using https://pkg.go.dev/golang.org/x/sync/singleflight instead of attestationReqMuPool

i'm not 100% certain it will be worth it over our manual implementation, but i think we should try and compare both approaches :)

WDYT?

edit: i just found this which adds generics, from a reputable repo (from one of the former Go team members): https://pkg.go.dev/tailscale.com/util/singleflight

I was looking for something like this ( turns out not hard enough :) ), thanks for bringing it up!

Although I was trying to find a package that handles both aspects: inflight + caching. Still, https://pkg.go.dev/tailscale.com/util/singleflight does simplify this code a bit (and unit tests don't complain) - see fcaa83d for how it's different compared to the code I had before.

From performance perspective, I don't think it matters much for this particular case, but if we were to push it to the limits - I think each of these 2 approaches will have it's preferred type of workload while doing worse in some circumstances.

@iurii-ssv

+1

the readability gain is worth the performance cost here IMO

(i don't think performance cost is noticeable below many thousands of calls per second)

iurii-ssv · 2024-10-21T21:11:57Z

go.mod

-go 1.22
+go 1.22.0
+
+toolchain go1.23.1


This change is the result of running go mod tidy after adding tailscale.com/util/singleflight (note also, I'm not using the latest version of this package because it will require to bump the go 1.22 line up to go 1.23.1),

I'm new to this toolchain concept, from what I've gathered it's used as "recommended" version (it will be downloaded and used in case you are trying to build with lower version).

Building Docker image with golang:1.22 seems to be working fine with this change.

Removed toolchain directive, see #1840 (comment) for details.

iurii-ssv · 2024-10-23T07:35:26Z

Tested on stage, seems to be working fine.

nkryuchkov

LGTM

nkryuchkov · 2024-10-23T13:00:55Z

CI passes for the last commit: https://github.com/ssvlabs/ssv/actions?query=branch%3Atest%2Fbeacon-client-attestations-cache

moshe-blox

LGTM 🔥

i've made some changes, we should test this again before merge

…only fetched during the slot of the duty and not later

iurii-ssv · 2024-11-18T15:24:01Z

This PR have been running on stage for several hours now without obvious issues (as far as I can tell).

* beacon-client: attestation data request caching * first review pass * add unit-tests for GetAttestationData * 2nd review pass * minor cleanup * gotta start ttlcache to prune expired items * check total number of slots requested * use singleflight package to hide low-level implementation details * fix linter * err must be func-local to not cause races * omit toolchain directive * lock Go 1.22.6 * deep copy AttestationData * reduce attestation data cache TTL to 2 slots because in practice its only fetched during the slot of the duty and not later * comment * comment * comment * comment * add clarifying comment about cached items lifetime --------- Co-authored-by: moshe-blox <[email protected]>