Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[trie] Unify implementation of DiskTrieIterator and MemTrieIterator #12813

Merged
merged 10 commits into from
Jan 28, 2025

Conversation

shreyan-gupta
Copy link
Contributor

@shreyan-gupta shreyan-gupta commented Jan 28, 2025

We had separate implementations for DiskTrieIterator and MemTrieIterator which were originally copy pasted from the same source. This PR unifies the implementation while exposing a simple interface to get nodes and values from trie.

Part of issue #12361

@@ -41,13 +41,6 @@ impl RawTrieNode {
None => Self::BranchNoValue(children),
}
}

pub fn has_value(&self) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed has_value function. It's not used anywhere

@@ -170,14 +171,18 @@ impl TrieViewer {
let mut values = vec![];
let query = trie_key_parsers::get_raw_prefix_for_contract_data(account_id, prefix);
let acc_sep_len = query.len() - prefix.len();
let mut iter = state_update.trie().disk_iter()?;
iter.remember_visited_nodes(include_proof);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got rid of remember_visited_nodes in favor of using recorded storage, which makes a lot more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be working as expected, I'll take a detailed look.

}

// Extension for State Parts processing
impl<I> TrieIteratorImpl<CryptoHash, ValueHandle, I>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could only implement this function for DiskTrieIterator as the type of Node and Value is the same, i.e. CryptoHash. It doesn't work for a generic implementation to store TrieTraversalItem.

Ideally we should get rid of this code in favor of using trie recorder :/

fn get_and_record_node(
&self,
ptr: GenericTrieNodePtr,
) -> Result<GenericTrieNode<GenericTrieNodePtr, GenericValueHandle>, StorageError>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemTrie doesn't use the Resultness of this function, but it's needed for DiskTrieIterator, specially given the fact we don't simply want to panic when a node is not found.

@@ -103,7 +102,7 @@ pub enum KeyLookupMode {
const TRIE_COSTS: TrieCosts = TrieCosts { byte_of_key: 2, byte_of_value: 1, node_cost: 50 };

#[derive(Clone, Copy, Hash)]
pub(crate) enum ValueHandle {
pub enum ValueHandle {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to make this pub from pub(crate) as disk_iter functions is exposed outside module

@shreyan-gupta shreyan-gupta marked this pull request as ready for review January 28, 2025 08:47
@shreyan-gupta shreyan-gupta requested a review from a team as a code owner January 28, 2025 08:47
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 95.67901% with 14 lines in your changes missing coverage. Please review.

Project coverage is 70.44%. Comparing base (8097202) to head (608e8fd).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
core/store/src/trie/ops/iter.rs 97.15% 4 Missing and 3 partials ⚠️
core/store/src/trie/iterator.rs 82.14% 1 Missing and 4 partials ⚠️
core/store/src/trie/mem/iter.rs 95.83% 0 Missing and 1 partial ⚠️
core/store/src/trie/mod.rs 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12813      +/-   ##
==========================================
- Coverage   70.47%   70.44%   -0.04%     
==========================================
  Files         847      848       +1     
  Lines      175001   174849     -152     
  Branches   175001   174849     -152     
==========================================
- Hits       123338   123178     -160     
- Misses      46413    46419       +6     
- Partials     5250     5252       +2     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (+<0.01%) ⬆️
db-migration 0.16% <0.00%> (+<0.01%) ⬆️
genesis-check 1.41% <0.00%> (+<0.01%) ⬆️
linux 70.05% <95.67%> (-0.02%) ⬇️
linux-nightly 70.07% <95.67%> (-0.05%) ⬇️
pytests 1.71% <0.00%> (+<0.01%) ⬆️
sanity-checks 1.52% <0.00%> (+<0.01%) ⬆️
unittests 70.28% <95.67%> (-0.04%) ⬇️
upgradability 0.20% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@@ -1000,7 +1000,7 @@ pub(crate) fn print_epoch_analysis(

// The parameters below are required for the next next epoch generation.
// For `CheckConsistency` mode, they will be overridden in the loop.
// For `Backtest` mode, they will stay the same and override information
// For `BackTest` mode, they will stay the same and override information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather add "backtest" to cspell and made these changes in separate PR (because this one is very involved)
Maybe there is some dictionary with engineering words we could just reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will revert! Apologies!

@@ -644,7 +644,7 @@ pub enum EpochAnalysisMode {
/// start epoch height.
/// TODO (#11477): doesn't work for start epoch height <= 544 because of
/// `EpochOutOfBounds` error.
Backtest,
BackTest,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather add "backtest" to cspell and made these changes in separate PR (because this one is very involved). Also, for clap enum here it implicitly changes flag to --back-test which is not desired.

Maybe there is some dictionary with engineering words we could just reuse?

pub hash: CryptoHash,
/// Key of the node if it stores a value.
pub key: Option<Vec<u8>>,
/// This is used only for state_viewer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, there is a naming collision... Can we reference it as TrieViewer because it is actually exposed to user, while classic state viewer is dev only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent to this change, confirming, it shouldn't be a problem if we were to change this in the future right? As long as we keep all the relevant nodes in the proof?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.
Now I think it doesn't record values because they are given anyway in ViewStateResult. Maybe we can add some flag to TrieRecorder to skip recording values.

Otherwise we could ask community whether anyone cares about current exact proof format or its size / notify about this change. I don't think there will be complaints.


pub struct MemTrieIterator<'a, M: ArenaMemory> {
root: Option<MemTrieNodePtr<'a, M>>,
/// Tiny wrapper around `MemTries` and `Trie` to provide `TrieIteratorStorageInterface` implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Tiny wrapper around `MemTries` and `Trie` to provide `TrieIteratorStorageInterface` implementation.
/// Tiny wrapper around `MemTries` and `Trie` to provide `GenericTrieInternalStorage` implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

@@ -716,39 +343,4 @@ mod tests {
.collect();
assert_eq!(got, want);
}

#[test]
fn test_has_value() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be migrated to TrieIteratorImpl for DiskTrieIteratorInner. Maybe there is no value in keeping it, but there is still a need to get all state KV pairs for state part AFAIU.

Copy link
Contributor Author

@shreyan-gupta shreyan-gupta Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test didn't make too much sense to me. It's basically checking if an internal implementation detail IterStep is of type value, then value should exist, else it shouldn't?

I think this test was originally in place as we had the function has_value() exposed from the iterator which no longer exists, hence the test is also irrelevant.

If we care about this for state parts, the better way of testing this is just doing the actual iteration and checking whether all data that we want exists, which I believe we already do in the state parts tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checked whether two different ways to check value existence match each other. However, your argument makes sense.

@shreyan-gupta shreyan-gupta added this pull request to the merge queue Jan 28, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 28, 2025
@shreyan-gupta shreyan-gupta added this pull request to the merge queue Jan 28, 2025
Merged via the queue into master with commit 57b9b81 Jan 28, 2025
28 of 29 checks passed
@shreyan-gupta shreyan-gupta deleted the shreyan/trie/iter branch January 28, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants