-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: single shard tracking State cleanup #12734
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12734 +/- ##
==========================================
+ Coverage 70.66% 70.71% +0.05%
==========================================
Files 849 849
Lines 174430 174612 +182
Branches 174430 174612 +182
==========================================
+ Hits 123256 123473 +217
+ Misses 46030 45978 -52
- Partials 5144 5161 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't finish the review but so far looks good. I'll let others approve or come back to it tomorrow.
let tracked_shards_in_gced_epoch_to_check_for_cleanup = if !shard_tracker | ||
.tracks_all_shards() | ||
&& epoch_manager.is_last_block_in_finished_epoch(block_hash)? | ||
{ | ||
Some(get_tracked_shards_in_past_epoch( | ||
&chain_store_update, | ||
&epoch_manager, | ||
block_hash, | ||
)?) | ||
} else { | ||
None | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe move to a helper method?
if let Some(potential_shards_for_cleanup) = | ||
tracked_shards_in_gced_epoch_to_check_for_cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Keep consistent naming e.g. call both potential_shards_for_cleanup
.
fn get_tracked_shards_in_past_epoch( | ||
chain_store_update: &ChainStoreUpdate, | ||
epoch_manager: &Arc<dyn EpochManagerAdapter>, | ||
past_epoch_block_hash: &CryptoHash, | ||
) -> Result<Vec<ShardUId>, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this method relies on the fact that the block is in past epoch or that it is the last block of that epoch. I would suggest removing past_epoch
from the method method and argument names.
/// State cleanup for single shard tracking. Removes State of shards that are no longer in use. | ||
/// | ||
/// It has to be run after we clear block data for the `last_block_hash_in_gced_epoch`. | ||
/// `tracked_shards_in_gced_epoch` are shards that were tracked in the gc-ed epoch, | ||
/// and these are shards that we potentially no longer use and that can be cleaned up. | ||
/// We do not clean up a shard if it has been tracked in any epoch later, | ||
/// or we care about it in the current or the next epoch (relative to Head). | ||
/// | ||
/// With ReshardingV3, we use State mapping (see DBCol::StateShardUIdMapping), | ||
/// where each `ShardUId` is potentially mapped to its ancestor to get the database key prefix. | ||
/// We only remove a shard State if all its descendants are ready to be cleaned up, | ||
/// in which case, we also remove the mapping from `StateShardUIdMapping`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great stuff!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
return Err(Error::GCError( | ||
"block on canonical chain shouldn't have refcount 0".into(), | ||
)); | ||
} | ||
debug_assert_eq!(blocks_current_height.len(), 1); | ||
|
||
// Do not clean up immediatelly, as we still need the State to run gc for this block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Do not clean up immediatelly, as we still need the State to run gc for this block. | |
// Do not clean up immediately, as we still need the State to run gc for this block. |
chain/chain/src/sharding.rs
Outdated
@@ -13,6 +15,18 @@ pub fn shuffle_receipt_proofs<ReceiptProofType>( | |||
receipt_proofs.shuffle(&mut rng); | |||
} | |||
|
|||
pub fn cares_about_shard_this_or_next_epoch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: As I read this function, it might as well be a member of shard tracker.
Feel free to ignore
/// Maximum number of epochs under which the test should finish. | ||
const TESTLOOP_NUM_EPOCHS_TO_WAIT: u64 = 8; | ||
/// Default number of epochs for resharding testloop to run. | ||
// TODO(resharding) Fix nearcore and set it to 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any insight on why it fails with 10? If you do it might be good to capture that in a comment or issue.
If you don't have any relevant clue, feel free to ignore this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the issue described here: https://near.zulipchat.com/#narrow/channel/407288-core.2Fresharding/topic/forknet/near/494174933
Do you know what made this change needed?
|
The debug assert I added failed at the beginning of a test: in |
Cleanup parent shard State if neither child was tracked for GC window epochs.
That also implements shards garbage collection in general, see #11883.
TODO: add more tests
Summary
TrieChanges
column, to determine what shards were tracked at the given epoch. We rely onTrieChanges
being saved to the last block of an epoch, for all shards that were tracked at given epoch. TODO: add a test that focuses on thatTesting
GC num epochs to keep set to 3.
Notation
P - parent shard
C - child shard
U - unrelated shard
Schedule: (epoch before resharding) | (epoch after resharding) | ... next epochs
Tested scenarios
test_resharding_v3_state_cleanup
test_resharding_v3_do_not_track_children_after_resharding
test_resharding_v3_stop_track_child_for_5_epochs
(in the end we do not map to parent)test_resharding_v3_stop_track_child_for_5_epochs_with_sibling_in_between
(in the end we map to parent)test_resharding_v3_shard_shuffling_untrack_then_track
test_resharding_v3_sync_child