Rethink state storage #3646

dimartiro · 2023-12-13T15:53:03Z

dimartiro
Dec 13, 2023
Collaborator

Summary

We need to rethink our current state storage implementation, not only to improve it but also to comply with the standards required by the protocol. This issue presents our current solution in contrast with Parity's solution, highlighting the changes we should make along with the benefits we would gain.

Current solution

Currently, our state storage solution, used among other things by the runtime, has various responsibilities:

Apply state changes for both tries and child tries (put, set, delete, clear).
Handle transactions (begin transaction, commit, rollback).
Get tries from memory.
Get tries from the database.

The protocol specifies that the state is a collection of keys and values Polkadot spec. This state must be re-arranged in a trie to calculate the merkle hash and to generate and verify merkle proofs. But internally, we always store the state in the form of a trie when this is only necessary when calculating the hash or generating proofs.

In our case, our state storage implementation get tries from memory and, if not found, loads them by looking up in the DB.
In order to do that we need to feed the storage structure with a DB connection and the in memory tries structure.

To handle the state transitions with transactions we use our current TrieState structure that contains 2 in memory tries used as transient tries to handle the changes and apply them if it is needed.

Here is a simplified diagram of our architecture:

Parity's solution

Parity built a trie crate that simplifies the basic tasks of a tree. The difference with our solution is that they do not store the data in the form of a trie but rather store it in a key-value database, and they have a TrieDB wrapper with trie functions over this database but do not have a trie built in memory at all times.

Each TrieDB represents a trie along with its root node and all its nodes, allowing the same database to be shared for all the tries that need to be represented.

Benefits

Simpler and more scalable

Our current solution looks more like spaghetti code relating different components in a way that is very difficult to follow and understand. It makes the solution complex to understand and could raise issues in the future if we do the wrong changes.

Parity's solution is more straightforward, easy to understand and scalable.

It allows us to implement lazy loading

With our current solution the only lazy loading behavior we have is when the state storage is loading tries from the database and putting them into memory.
We have many problems with that:

It works for tries but not for nodes, when we are loading a trie from the DB we are decoding the entire trie with all its nodes when we only need to use only one node value.
It stores the decoded trie into memory "forever" because we don't have a caching mechanism
Even if we implement the cache it will work for tries but not for nodes so the saved memory will be not as much as we can gain lazy loading nodes

If we want to implement lazy loading of nodes with our current solution we will have to change the way a node is decoded to prevent decoding its childs and we will going to have some repeated code with different behaviors for tries and nodes.

If we implement a solution similar than Parity's trie crate we will have Tries with a similar behavior than Parity's TrieDB and we are going to load nodes from database and decoding them only if we need their values, with the ability to cache the most frequently used in a simple manner

Reduce memory usage

Since we are going to have everything in database and only the most frequently used nodes / values in memory we are going to reduce our memory usage significantly.

A way to solve this using Parity's crate is having a local nodes / values cache for every trie so any trie could be able to cache its nodes independently solving this issue #3623

We stop thinking about trees and start thinking about the state as keys and values.

If we use this solution we are going to start saving a collection of key value pairs instead of saving tries in memory, the "Trie" will be an abstract structure on top of a database that will be re arranging this key value pairs collection into a trie only when it is needed.

In the past, was difficult to implement things like trie V1 because our internal implementation is too couple with the Trie structure, this change of mindset will help us to decouple the data from the representation and we could be free to change the Trie structure in the future with less risks of breaking something.

We can implement nested transactions more easily

We are going to stop using our current TrieState implementation to handle transactions and we could do it in the same TrieDB simplifying the transaction management as well as our architecture.

That will help us to solve this issue #3508

Necessary Changes

I think the best way to implement this is like we did with Grandpa. We have to translate the parity create to golang and then start using it in our client.

The main part we have to translate is the trie-db implementation

We already have a PR that we can use as starting point #3390

Related issues

dimartiro · 2023-12-13T20:25:00Z

dimartiro
Dec 13, 2023
Collaborator Author

We were talking about it with @cheme and he added his thoughts here

0 replies

EclesioMeloJunior · 2023-12-14T15:03:06Z

EclesioMeloJunior
Dec 14, 2023
Collaborator

@dimartiro given the discussion with Cheme, he said that the trie-db and the crate are very generic to address rocks-db and ethereum use cases, so could you describe what components will think should be translated? So, for example, we can have an overview of what is abstract to the trie and what parts are related to eth

3 replies

dimartiro Dec 14, 2023
Collaborator Author

I think the main component that can benefit us is the trie-db that will allow us to have tries with a shared DB, nodes and values caching and simplify our logic.

What do you think?

EclesioMeloJunior Dec 14, 2023
Collaborator

Makes sense for me, AFAIK these are the main parts of it, right?

Codec, then we can parse a node to the raw byte representation
Key-Value Store (DB), which will persist the data
Cache, then we don't need to load data from DB which is an expensive op
Recorder?

dimartiro Dec 14, 2023
Collaborator Author

Exactly! The recorder is useful to create merkle proofs.

Example:

pub fn generate_proof<L>(
    db: &dyn HashDBRef<L::Hash, DBValue>,
    root: &TrieHash<L>,
    key: &[u8],
) -> TrieResult<(Vec<Vec<u8>>, Option<Vec<u8>>), TrieHash<L>, CError<L>>
where
    L: TrieLayout,
{
    let mut recorder = Recorder::<L>::new();

    let item = {
        let trie = TrieDBBuilder::<L>::new(db, root).with_recorder(&mut recorder).build();
        trie.get(key)?
    };

    let proof: Vec<Vec<u8>> = recorder.drain().into_iter().map(|r| r.data).collect();
    Ok((proof, item))
}

dimartiro · 2024-02-14T13:28:50Z

dimartiro
Feb 14, 2024
Collaborator Author

Based on our last conversations I'll put some key points from my last proposal here:

Proposed changes

Since replicating Parity's trie implementation would necessitate almost replacing our entire codebase with their logic, and considering that all these changes are interconnected, this would require creating a single, large PR encompassing all the modifications. This approach could demand a significant amount of time not only from the implementer but also from the reviewers. Moreover, after developing the library, we would need to modify other parts of our code to start utilizing it. Therefore, I propose reusing as many parts of our existing code as possible, such as the encode and decode functions for nodes, merkle proof generation, and verification, while replacing the necessary components to add lazy loading. For example: loading the trie from the database, modifying trie and node structures, and introducing new structures.

Furthermore, to streamline the review process and deliver value in a short period, I propose implementing these changes incrementally, following this plan:

Phase 0

In order to be ready for the further changes, I believe we should do a little refactor first:

PR # 1:

Move all trie-related code to a single package in /pkg/trie.

PR # 2:

Create a Trie interface with the minimum necessary methods for a trie. Such as: get, insert, remove, contains, root
Rename the current Trie struct to InMemoryTrie.

Phase 1

PR # 3

To improve node handling, I propose replacing our current Node structure with one oriented towards enums to more easily and structurally differentiate nodes.

Along with this, we would replace the current StorageValue []byte and IsHashedValue bool fields with an enum NodeValue as follows:

NodeValue:

type (
    // Value bytes as stored in a trie node
    Inline struct {
        Bytes []byte
    }
    // Containing a hash used to lookup in db for real value
    Hashed struct {
        Hash []byte
    }
)

Node:

type (
    // Leaf always contains values
    Leaf struct {
        PartialKey []byte
        Value NodeValue
    }
    // Use pointer to describe optional child / value
    Branch struct {
        PartialKey []byte
        Children [16]*Node
        Value *NodeValue
    }
)

Phase 2

In this phase, we would add lazy loading, so we need to introduce the concept of TrieDB that implements the Trie and uses a db as a data source.

Consider the following naive implementation assuming that:

Every write writes directly to the database.
Every read reads directly from the database.

In this case, what we would need to do is the following:

Stop building the in-memory trie

We must modify our node enum so that the Children list of a branch are not another node but since it contains scale encoded hashes of encoded nodes we can replace it with a list of node hashes instead.

Branch struct {    
    PartialKey []byte
    Children [16]*Hash //Hash == Hash(encoded_node)
    Value *NodeValue
}

Where Hash is the type of data returned by the hash algorithm we are using.

Lookup

We can create a Lookup structure responsible for traversing the tree looking for a node given its fullKey and returning the value following the algorithm showed before:

TrieDB

Given that now our concept of Trie is not in fact a 100% pre built trie but instead it refers to the way we read, traverse, and store the keys and values of our DB, we need to introduce the concept of TrieDB which is a wrapper over a DB based on a rootHash and using a trieLayout as configuration holder similar to Parity's implementation.

type TrieLayout interface {
    MaxInlineValue() *int
}

type TrieDB struct {
    db            DB
    root          Hash
    trieLayout    TrieLayout
}

This structure will implement all the methods defined in the Trie interface introduced in Phase 0 so we could easily replace the version used in our code from InMemoryTrie to TrieDB and all the tests should still working.

Note: Depending on the performance of this phase, the changes of phase 2 and phase 3 may be made in the same PR.

Phase 3

It's well known that writing and reading from the database (disk) is a slow process, so to optimize throughput, we must add a cache for readings and accumulate writings to write them all together when we know we no longer have more changes to add.

Caching

We can easily add a cache as part of the TrieDB structure and use it when necessary, mainly modifying the Lookup structure but potentially also using it in methods where we need to know the trie structure to add, modify, or delete nodes.

type TrieDB struct {
    db             DB
    root           Hash
    trieLayout     TrieLayout
    cache          TrieCache
}

In memory changes

To prevent executing writes to the database every time a new node is inserted we are going to have a list of "in memory changes".

Those changes will only be applied with the commit() function is called

Commit

We can execute all trie modifications in one place to avoid overhead in the DB if we wrote all the changes in real-time.

For this, and to continue complying with the Trie interface, we must call this function in a method that is already defined in that interface.

I propose using the same strategy as Substrate and executing the commit() in the root() method.

func (tdb *TrieDB) Root() Hash {
    tdb.commit()
    return tdb.root
}

Note: This TrieDB is not responsible for handling the state transactions, we should solve that in a different layer on top of this one. That layer will accumulate and apply temporal "change sets" and when a transaction commit is executed the resulting change set will by applied using this TrieDB API.

Phase 4

We could make our Trie more generic.
This is not a must but is a nice to have if we want to add it

Use generic hash type

We can change our TrieDB and our structure to receive a generic type representing the hash algorithm output type

TrieDB[H]

type TrieDB[H Hash] struct {
    db             DB[H]
    root           H
    trieLayout     TrieLayout[H]
    cache          TrieCache[H]
}

NodeValue[H]:

type (
    // Value bytes as stored in a trie node
    Inline struct {
        Bytes []byte
    }
    // Containing a hash used to lookup in db for real value
    Hashed[H Hash] struct {
        Hash H
    }
)

Node[H]:

type (
    // Leaf always contains values
    Leaf[H] struct {
        PartialKey []byte
        Value NodeValue[H]
    }
    // Use pointer to describe optional child / value
    Branch[H] struct {
        PartialKey []byte
        Children [16]*H
        Value *NodeValue[H]
    }
)

Hasher interface

Would be great if we can add a hasher interface to create different hashers for the keys in our database. This will allow use to easily change the hash algorithm in the future.

type Hasher[H Hash] interface {
	Length() int
	Hash(value []byte) H
	FromBytes(value []byte) H
}

NodeCodec interface

We can implement an interface similar to Parity's NodeCodec trait to be able to use different encode and decode strategies in the future

type NodeCodec[H hashdb.HashOut] interface {
	HashedNullNode() H
	Hasher() hashdb.Hasher[H]
	EmptyNode() []byte
	LeafNode(partialKey nibble.NibbleSlice, numberNibble int, value Value) []byte
	BranchNodeNibbled(
		partialKey nibble.NibbleSlice,
		numberNibble int,
		children [16]ChildReference[H],
		value *Value,
	) []byte
	Decode(data []byte) (Node, error)
}

TrieLayout changes

Then we can add the generic Codec in our TrieLayout and use it in our TrieDB with a few changes in the codebase.

type TrieLayout[H Hash] interface {
    MaxInlineValue() *int
    Codec() NodeCodec[H]
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink state storage #3646

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Rethink state storage #3646

dimartiro Dec 13, 2023 Collaborator

Summary

Current solution

Parity's solution

Benefits

Simpler and more scalable

It allows us to implement lazy loading

Reduce memory usage

We stop thinking about trees and start thinking about the state as keys and values.

We can implement nested transactions more easily

Necessary Changes

Other information and links

Parity's Trie Crate

TrieDB

Substrate Cache

Related issues

Replies: 3 comments · 3 replies

dimartiro Dec 13, 2023 Collaborator Author

EclesioMeloJunior Dec 14, 2023 Collaborator

dimartiro Dec 14, 2023 Collaborator Author

EclesioMeloJunior Dec 14, 2023 Collaborator

dimartiro Dec 14, 2023 Collaborator Author

dimartiro Feb 14, 2024 Collaborator Author

Proposed changes

Phase 0

PR # 1:

PR # 2:

Phase 1

PR # 3

Phase 2

Stop building the in-memory trie

Lookup

TrieDB

Phase 3

Caching

In memory changes

Commit

Phase 4

Use generic hash type

Hasher interface

NodeCodec interface

TrieLayout changes

dimartiro
Dec 13, 2023
Collaborator

Replies: 3 comments 3 replies

dimartiro
Dec 13, 2023
Collaborator Author

EclesioMeloJunior
Dec 14, 2023
Collaborator

dimartiro Dec 14, 2023
Collaborator Author

EclesioMeloJunior Dec 14, 2023
Collaborator

dimartiro Dec 14, 2023
Collaborator Author

dimartiro
Feb 14, 2024
Collaborator Author