-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test_long_chain_with_restart_from_snapshot #10193
Conversation
… prev_epoch_id (#10195) We had an issue in the `clear_resharding_data` while fetching the `prev_epoch_id`. The implementation of `prev_epoch_id` relied on fetching the block_info of the last block of prev_epoch to get the epoch_id. This unfortunately failed for the case of GC as the block_info was already garbage collected. The new implementation here relies on using the block_header to get the epoch_id instead of block_info. This was unfortunately only caught in mocknet and not integration tests as having a small enough epoch_length lead to the block_info being cached in the epoch_manager (even though it was GC'd) Zulip post: https://near.zulipchat.com/#narrow/stream/295558-pagoda.2Fcore/topic/Master.20binary.20Can't.20clear.20old.20data/near/402240517 Pending: Figure out a way to include test from this PR: #10193 We would need to enable no_cache feature for the test.
Adding test that should fail before the fix mentioned above. But the CI is green. nearcore/chain/client/src/client.rs Line 1542 in 4eb4d0d
log_assert! only panics in debug mode, and our CI pipelines use --cargo-profile quick-release @Ekleog-NEAR do you have ideas? I really want this to fail per-commit. Also, this test can be much shorter, if I can enable no_cache feature in it.
|
@nagisa I recall we discussed quick-release vs. dev for CI. It seems like quick-release is running without debug assertions? If so, I think we should revert and switch back to using the dev profile for CI, even though it might be slower, because it could miss intentionally-placed debug assertions :/ (Or we could add debug-assertions for quick-release, which might make sense considering it’s not for production usage, but then it’d probably be better to rename it for something like dev-release to be clear it’s not intended for prod) |
I guess you mean the compile-time no_cache? It’s something that we unfortunately can’t enable in a single test, not without making it a real snowflake that will become hard to maintain (unless we have a lot of tests that require this). Would it be hard to remove the compile-time no_cache feature, and replace it with a runtime configuration, that could be turned on/off from the test itself? |
I’m in favour of making it |
I just opened #10205 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
.archive(false) | ||
.build(); | ||
|
||
env1.clients[0].config.gc.gc_blocks_limit = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you add a small comment on why this is needed?
} | ||
|
||
let mut env2 = TestEnv::builder(chain_genesis) | ||
.stores(vec![env1.clients[0].chain.store().store().clone()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat!
Tested on updated #10193 `is_next_block_epoch_start` reads `BlockInfo` of the first block of the epoch, which is not correct in the case of garbage collection. The previous version of the test was passing, because that `BlockInfo` was still in the cache of the epoch manager (as we call `is_next_block_epoch_start` on every block) Implemented safer version of `is_next_block_epoch_start` -- `is_last_block_in_finished_epoch` (didn't think too much about the name, open to suggestions). `is_last_block_in_finished_epoch` works by relying on the fact that if we processed block, and it was the last block of an epoch according to `is_next_block_epoch_start`, then we wrote an `EpochInfo` with the hash of that block, and `EpochInfo` is not garbage collectible.
The main motivation for the test is to see how garbage collection will behave without caches if we restart from the middle of an epoch.