-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(resharding): Adjust State mapping check for single shard tracking #12706
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to just leave comments instead of approving because I'm not super familiar with the state mapping code, so I would need to check it some more before understanding why the change here is allowed, but I'll come back to it tmr if it hasnt been approved yet
.get_prev_epoch_id_from_prev_block(&tip.prev_block_hash) | ||
.unwrap(); | ||
let epoch_config = client.epoch_manager.get_epoch_config(&prev_epoch_id).unwrap(); | ||
let epoch_id = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also just use epoch_id()
on the block_header
variable stored above. Also, do we need this change? I mean I guess it's all the same anyway, but could just keep the old one since it's shorter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed in fa8e29e
client: &Client, | ||
prev_block_hash: &CryptoHash, | ||
) -> Vec<ShardUId> { | ||
let account_id = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could avoid cloning:
diff --git a/integration-tests/src/test_loop/utils/sharding.rs b/integration-tests/src/test_loop/utils/sharding.rs
index 9db148659..5dab61245 100644
--- a/integration-tests/src/test_loop/utils/sharding.rs
+++ b/integration-tests/src/test_loop/utils/sharding.rs
@@ -123,14 +123,14 @@ pub fn get_tracked_shards_from_prev_block(
client: &Client,
prev_block_hash: &CryptoHash,
) -> Vec<ShardUId> {
- let account_id =
- client.validator_signer.get().map(|validator| validator.validator_id().clone());
+ let signer = client.validator_signer.get();
+ let account_id = signer.as_ref().map(|s| s.validator_id());
let mut tracked_shards = vec![];
for shard_uid in
client.epoch_manager.get_shard_layout_from_prev_block(prev_block_hash).unwrap().shard_uids()
{
if client.shard_tracker.care_about_shard(
- account_id.as_ref(),
+ account_id,
prev_block_hash,
shard_uid.shard_id(),
true,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in fa8e29e
.iter() | ||
.any(|child_shard_uid| shards_tracked_after_resharding.contains(child_shard_uid)) | ||
{ | ||
assert_eq!(shard_uid_mapping.len(), 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this makes me think of another test case to add: chunk producer tracks the parent and a child after reshrding, then gets assigned to an unrelated shard, then in the future state syncs the child again. then the state mapping will still exist but the state application in state sync will just write the child shard uid directly in the DB. Wonder what happens in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the child shard is no longer tracked doesn't the mapping get removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State cleanup will deal with cleaning resharding mapping. For now, once mapping is set it will be used forever for the child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this makes me think of another test case to add: chunk producer tracks the parent and a child after reshrding, then gets assigned to an unrelated shard, then in the future state syncs the child again. then the state mapping will still exist but the state application in state sync will just write the child shard uid directly in the DB. Wonder what happens in that case
I am pretty sure that was covered by shard shuffling test, but added test_resharding_v3_stop_track_child_for_2_epochs
that would be helpful to test state cleanup.
@@ -485,56 +483,47 @@ fn test_resharding_v3_base(params: TestReshardingParameters) { | |||
|
|||
let client = clients[client_index]; | |||
let block_header = client.chain.get_block_header(&tip.last_block_hash).unwrap(); | |||
let shard_layout = client.epoch_manager.get_shard_layout(&tip.epoch_id).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection from me on removing this and the print below, but I wonder if anybody was using these print statements while debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think those are good to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't used them myself. We can add them back locally to debug if needed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will leave on of them, because they duplicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in fa8e29e
epoch_config.shard_layout.num_shards() as usize | ||
); | ||
} | ||
// If any child shard was tracked after resharding, it means the node had to split the parent shard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering what happens if the node tracks the parent both not any child
We do resharding for nothing? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added a test in fa8e29e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -485,56 +483,47 @@ fn test_resharding_v3_base(params: TestReshardingParameters) { | |||
|
|||
let client = clients[client_index]; | |||
let block_header = client.chain.get_block_header(&tip.last_block_hash).unwrap(); | |||
let shard_layout = client.epoch_manager.get_shard_layout(&tip.epoch_id).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think those are good to have.
for shard_uid in | ||
client.epoch_manager.get_shard_layout_from_prev_block(prev_block_hash).unwrap().shard_uids() | ||
{ | ||
if client.shard_tracker.care_about_shard( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of a mismatch between the function name and the implementation here.
The care_about_shard
only checks if the client is a producer for given shard in this epoch. The client might also track a shard because it will be the producer in the next epoch or because it is configured to track all shards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
care_about_shard
also checks tracked shards, if me
is true
:
match self.tracked_config {
TrackedConfig::AllShards => {
// Avoid looking up EpochId as a performance optimization.
true
}
_ => self.tracks_shard(shard_id, parent_hash).unwrap_or(false),
}
@@ -350,17 +355,35 @@ pub fn check_state_shard_uid_mapping_after_resharding( | |||
epoch_config.shard_layout.get_children_shards_uids(parent_shard_uid.shard_id()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit though not your code:
It's more typical to just get the shard layout directly from the epoch manager instead of going through the epoch config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in fa8e29e
prev_block_hash: &CryptoHash, | ||
resharding_block_hash: &CryptoHash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels strange to pass the prev_block_hash and resharding_block_hash, and get the tip from the client on the first line of this method. Mixing "pure"-ish function approach with "stateful" approach is asking for trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in fa8e29e
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #12706 +/- ##
==========================================
+ Coverage 70.48% 70.66% +0.18%
==========================================
Files 848 848
Lines 173673 173788 +115
Branches 173673 173788 +115
==========================================
+ Hits 122418 122815 +397
+ Misses 46143 45845 -298
- Partials 5112 5128 +16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
7f6d7bb
to
e9a8b26
Compare
e9a8b26
to
73d28fd
Compare
} | ||
|
||
#[test] | ||
fn test_resharding_v3_stop_track_child_for_2_epochs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -692,6 +747,8 @@ fn test_resharding_v3_double_sign_resharding_block() { | |||
} | |||
|
|||
#[test] | |||
// TODO(resharding): fix nearcore and un-ignore this test | |||
#[ignore] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened for test_resharding_v3_shard_shuffling
and test_resharding_v3_shard_shuffling_intense
, which change made them fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing test duration, looks like there is some bug, will describe soon.
At this stage of the PR we have new failing tests, and different options how to approach this, for instance:
Personally the only one I don't like much is 1. |
Reverted ignores, as there are at least 4 failing tests and I do not want all of them not being run in other PRs. @marcelo-gonzalez Do you think the pattern of tracked shards that I posted on zulip could make state sync into trouble? UPDATE: maybe never mind, |
Unblocks #12691
Changes
check_state_shard_uid_mapping_after_resharding
so that it can be run for a client that does not track all shards.check_state_shard_uid_mapping_after_resharding
for each client.