-
Notifications
You must be signed in to change notification settings - Fork 272
implement validator custody with column refill mechanism #7127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: column-syncer
Are you sure you want to change the base?
Conversation
for i in startIndex..<SLOTS_PER_EPOCH: | ||
let blck = vcus.dag.getForkedBlock(blocks[int(i)]).valueOr: continue | ||
withBlck(blck): | ||
when typeof(forkyBlck).kind < ConsensusFork.Fulu: continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this ever happen? it's much cheaper to detect this before the block is ever fetched from the database, based on the slot number + fork schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detect this before the block is ever fetched from the database
the only way to make this cheaper is to see if the earliest data column in the database is against a Fulu block only then it makes sense to skip this check, imo, otherwise everytime we pull a finalized block from the DB, it's safe to check the fork, also say it can be a situation just a couple epochs after Fulu transition that on wanting block range i get EEEEEEFFFFFFF
, i want to refill only against the last Fulu blocks, and wanna continue for the first few Es, note that Es and Fs should ideally way more in number, this is just for an example
for cindex in request_item.indices: | ||
let lookup = DataColumnIdentifier(block_root: request_item.block_root, | ||
index: cindex) | ||
if lookup notin vcus.global_refill_list and cindex in remoteCustodyColumns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often is this cindex in RemoteCustodyColumns
expected to occur? Each one is an O(n)
lookup, in a for cindex in request_items.indices
loop, due to
nimbus-eth2/beacon_chain/spec/peerdas_helpers.nim
Lines 73 to 75 in 77cfa78
func resolve_columns_from_custody_groups*(node_id: NodeId, | |
custody_group_count: CustodyIndex): | |
seq[ColumnIndex] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this commit affect checkIntersectingCustody()
?
https://github.com/status-im/nimbus-eth2/actions/runs/14842398481/job/41668253800?pr=7127
|
vcus.network.nodeId, | ||
max(vcus.dag.cfg.SAMPLES_PER_SLOT.uint64, | ||
vcustody)) | ||
newer_column_set = newer_columns.toHashSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why
nimbus-eth2/beacon_chain/spec/peerdas_helpers.nim
Lines 73 to 85 in d2f2338
func resolve_columns_from_custody_groups*(node_id: NodeId, | |
custody_group_count: CustodyIndex): | |
seq[ColumnIndex] = | |
let | |
custody_groups = node_id.get_custody_groups(custody_group_count) | |
var flattened = | |
newSeqOfCap[ColumnIndex](COLUMNS_PER_GROUP * custody_groups.len) | |
for group in custody_groups: | |
for index in compute_columns_for_custody_group(group): | |
flattened.add index | |
flattened |
doesn't just always construct in-place with
incl
or similar and then return a HashSet[ColumnIndex]
rather than this seq[ColumnIndex]
which as far as I can tell, every single user of resolve_columns_from_custody_groups()
converts to a HashSet
before using or storing:
nimbus-eth2/beacon_chain/nimbus_beacon_node.nim
Lines 425 to 428 in d2f2338
custody_columns_set = | |
node.network.nodeId.resolve_column_sets_from_custody_groups( | |
max(SAMPLES_PER_SLOT.uint64, | |
localCustodyGroups)) |
calls only the wrapper function, to get its
HashSet[ColumnIndex]
.
nimbus-eth2/beacon_chain/sync/request_manager.nim
Lines 65 to 68 in d2f2338
RequestManager* = object | |
network*: Eth2Node | |
supernode*: bool | |
custody_columns_set: HashSet[ColumnIndex] |
stores a
HashSet[ColumnIndex]
, not a seq[ColumnIndex]
, which it gets from the nimbus_beacon_node
code.
nimbus-eth2/beacon_chain/sync/request_manager.nim
Lines 352 to 359 in d2f2338
remoteCustodyColumns = | |
remoteNodeId.resolve_column_sets_from_custody_groups( | |
max(SAMPLES_PER_SLOT.uint64, | |
remoteCustodyGroupCount)) | |
for local_column in rman.custody_columns_set: | |
if local_column notin remoteCustodyColumns: | |
return false |
uses the
resolve_column_sets_from_custody_groups()
wrapper again.
And this code here doesn't use resolve_column_sets_from_custody_groups()
but does the same thing. So it seems like in general resolve_columns_from_custody_groups()
should natively return a HashSet
and avoid the redundant copying, allocating, conversions, et cetera.
At that point, one wouldn't need a separate
nimbus-eth2/beacon_chain/spec/peerdas_helpers.nim
Lines 87 to 91 in d2f2338
func resolve_column_sets_from_custody_groups*(node_id: NodeId, | |
custody_group_count: CustodyIndex): | |
HashSet[ColumnIndex] = | |
node_id.resolve_columns_from_custody_groups(custody_group_count).toHashSet() |
at all, because all those
HashSet[ColumnIndex]
users would just be able to call resolve_columns_from_custody_groups()
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nimbus-eth2/beacon_chain/validators/message_router.nim
Lines 191 to 200 in 566257e
custody_columns = | |
router[].network.cfg.resolve_columns_from_custody_groups( | |
router[].network.node_id, | |
max(SAMPLES_PER_SLOT.uint64, | |
metadata)) | |
var final_columns: seq[DataColumnSidecar] | |
for dc in dataColumns: | |
if dc.index in custody_columns: | |
final_columns.add dc |
doesn't use the
HashSet
but probably should, and it wouldn't particularly be more expensive if it were constructed natively rather than converted from the seq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are looking at the wrong branch, i think there are significant use cases of resolve_columns_from_custody_groups
, and hashset conversion is only applicable to most syncers/refillers/backfillers where lookup is more frequent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are looking at the wrong branch
Yes, this is true -- those were mostly from the stable
branch, so e.g., resolve_column_sets_from_custody_groups
isn't present in either column-syncer
or vcus
, so that part isn't relevant/accurate.
That said:
i think there are significant use cases of
resolve_columns_from_custody_groups
, and hashset conversion is only applicable to most syncers/refillers/backfillers where lookup is more frequent
Looking again specifically at the current state of the column-syncer
and vcus
branches:
In column-syncer
:
Doesn't use HashSet
, probably should, only cares interface-wise that in
works:
nimbus-eth2/beacon_chain/validators/message_router.nim
Lines 189 to 200 in 566257e
let | |
metadata = router[].network.metadata.custody_group_count | |
custody_columns = | |
router[].network.cfg.resolve_columns_from_custody_groups( | |
router[].network.node_id, | |
max(SAMPLES_PER_SLOT.uint64, | |
metadata)) | |
var final_columns: seq[DataColumnSidecar] | |
for dc in dataColumns: | |
if dc.index in custody_columns: | |
final_columns.add dc |
Immediately converts to HashSet
via toHashSet
:
nimbus-eth2/beacon_chain/sync/request_manager.nim
Lines 345 to 355 in 566257e
let | |
remoteNodeId = fetchNodeIdFromPeerId(peer) | |
remoteCustodyColumns = | |
rman.cfg.resolve_columns_from_custody_groups( | |
remoteNodeId, | |
max(rman.cfg.SAMPLES_PER_SLOT.uint64, | |
remoteCustodyGroupCount)) | |
remoteSet = remoteCustodyColumns.toHashSet() | |
for local_column in rman.custody_columns_set: | |
if local_column notin remoteSet: | |
return false |
Immediately converts to HashSet
via toHashSet
for RequestManager
initialization purposes, but also uses dataColumnQuarantine[].custody_columns
with the seq
:
nimbus-eth2/beacon_chain/nimbus_beacon_node.nim
Lines 414 to 422 in 566257e
dataColumnQuarantine[].custody_columns = | |
dag.cfg.resolve_columns_from_custody_groups( | |
node.network.nodeId, | |
max(dag.cfg.SAMPLES_PER_SLOT.uint64, | |
localCustodyGroups)) | |
let | |
custody_columns_set = | |
dataColumnQuarantine[].custody_columns.toHashSet() |
https://github.com/status-im/nimbus-eth2/blob/083ed100bab70e07331f0c3e9692b1b5d3eed412/beacon_chain/consensus_object_pools/data_column_quarantine.nim from column-syncer
itself uses custody_columns
field in a few ways:
for col_idx in quarantine.custody_columns: |
for i in quarantine.custody_columns: |
for idx in quarantine.custody_columns: |
if collectedColumns.len >= (quarantine.custody_columns.len div 2): |
if collectedColumns.len == quarantine.custody_columns.len: |
for i in quarantine.custody_columns: |
The field is marked for export, and probably shouldn't be, because that's mainly there to let it be initialized from nimbus_beacon_node
, but nothing reads from it later outside that module.
Anyway, it does two things, iterates, in a way it's unclear that order matters (i.e. HashSet
arbitrary ordering is ok) and checks length, either of which is fine in a HashSet
. And this HashSet
is constructed anyway, a line later in nimbus_beacon_node
.
That's all the usage in column-syncer
. On to vcus
(this PR):
message_router
's usage is identical:
nimbus-eth2/beacon_chain/validators/message_router.nim
Lines 189 to 200 in 3dc502b
let | |
metadata = router[].network.metadata.custody_group_count | |
custody_columns = | |
router[].network.cfg.resolve_columns_from_custody_groups( | |
router[].network.node_id, | |
max(SAMPLES_PER_SLOT.uint64, | |
metadata)) | |
var final_columns: seq[DataColumnSidecar] | |
for dc in dataColumns: | |
if dc.index in custody_columns: | |
final_columns.add dc |
request_manager
's usage is identical:
nimbus-eth2/beacon_chain/sync/request_manager.nim
Lines 345 to 355 in 3dc502b
let | |
remoteNodeId = fetchNodeIdFromPeerId(peer) | |
remoteCustodyColumns = | |
rman.cfg.resolve_columns_from_custody_groups( | |
remoteNodeId, | |
max(rman.cfg.SAMPLES_PER_SLOT.uint64, | |
remoteCustodyGroupCount)) | |
remoteSet = remoteCustodyColumns.toHashSet() | |
for local_column in rman.custody_columns_set: | |
if local_column notin remoteSet: | |
return false |
nimbus_beacon_node
's usage is identical:
nimbus-eth2/beacon_chain/nimbus_beacon_node.nim
Lines 426 to 434 in 3dc502b
dataColumnQuarantine[].custody_columns = | |
dag.cfg.resolve_columns_from_custody_groups( | |
node.network.nodeId, | |
max(dag.cfg.SAMPLES_PER_SLOT.uint64, | |
localCustodyGroups)) | |
let | |
custody_columns_set = | |
dataColumnQuarantine[].custody_columns.toHashSet() |
The data column quarantine usage is the same: https://github.com/status-im/nimbus-eth2/blob/083ed100bab70e07331f0c3e9692b1b5d3eed412/beacon_chain/consensus_object_pools/data_column_quarantine.nim so, iteration in a way where exact order shouldn't matter and length-checking, both easy and reasonably efficient with the HashSet
that's constructed anyway.
But vcus
adds validator_custody
usage.
Immediately converts via toHashSet
:
nimbus-eth2/beacon_chain/sync/validator_custody.nim
Lines 79 to 85 in 3dc502b
let | |
newer_columns = | |
vcus.dag.cfg.resolve_columns_from_custody_groups( | |
vcus.network.nodeId, | |
max(vcus.dag.cfg.SAMPLES_PER_SLOT.uint64, | |
vcustody)) | |
newer_column_set = newer_columns.toHashSet() |
Doesn't use a HashSet
, but probably should:
nimbus-eth2/beacon_chain/sync/validator_custody.nim
Lines 133 to 146 in 3dc502b
let | |
remoteNodeId = fetchNodeIdFromPeerId(peer) | |
remoteCustodyColumns = | |
vcus.dag.cfg.resolve_columns_from_custody_groups( | |
remoteNodeId, | |
max(vcus.dag.cfg.SAMPLES_PER_SLOT.uint64, | |
remoteCustodyGroupCount)) | |
for request_item in vcus.requested_columns: | |
var colIds: seq[ColumnIndex] | |
for cindex in request_item.indices: | |
let lookup = DataColumnIdentifier(block_root: request_item.block_root, | |
index: cindex) | |
if lookup notin vcus.global_refill_list and cindex in remoteCustodyColumns: | |
colIds.add cindex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deliberately putting this in a separate comment:
nimbus-eth2/beacon_chain/spec/peerdas_helpers.nim
Lines 119 to 131 in 083ed10
func resolve_columns_from_custody_groups*(cfg: RuntimeConfig, node_id: NodeId, | |
custody_group_count: CustodyIndex): | |
seq[ColumnIndex] = | |
let | |
custody_groups = node_id.get_custody_groups(custody_group_count) | |
var flattened = | |
newSeqOfCap[ColumnIndex](COLUMNS_PER_GROUP * custody_groups.len) | |
for group in custody_groups: | |
for index in compute_columns_for_custody_group(cfg, group): | |
flattened.add index | |
flattened |
could just as easily directly construct the HashSet
that most of those use case either already use or should use, and never construct the seq
at all. Then no repeatedly converting to it, no issues with some places using the seq
(and incurring O(n) search times) and some places not, et cetera.
The data column quarantine itself doesn't really benefit, but it's not hurt either that I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://nim-lang.org/docs/packedsets.html provides another alternative, I'm not that familiar with its performance tradeoffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be tackled in the column-syncer
branch with a subsequent rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
…on for the validator custody loop
No description provided.