Database pruning for Gossamer #1591
dutterbutter
started this conversation in
Research
Replies: 1 comment
-
Implementations of above discussed pruning solutions can be viewed here:
Great work! @arijitAD |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Database pruning for Gossamer
Overview
This presents two possible state trie pruning algorithms to be included in Gossamer. These are the current production algorithms for Geth and Substrate.
To describe both algorithms the following terminology will be used
Bloom Filtering (Geth)
A Bloom filter is a data structure for efficiently testing set inclusion. Elements can be added to the filter in$O(1)$ and elements can be tested for inclusion in $O(1)$ . It operates probabilistically so it can determine if an element is defintely not in the set or if it probably is in the set with some fixed error probability for false positives.
A Bloom filter is constructed by traversing the state trie for each block in the state window. Each node in each state trie is added to the bloom filter.
The algorithm works as follows:
Some small percentage if nodes that should be deleted will be kept due to the error rate of the bloom filter but this can be very small (<0.01%). The algorithm scales linearly in the number of trie nodes in the db and the size of the state tries. A Bloom filter has constant memory requirements.
This main disadvantage of this approach is its need to be run offline. The node must be stopped and the pruning process run on the database for a fixed window. Despite this it is a solution that is simple to understand and implement and can be implemented externally to the main block processing loop. This would be a good short term solution.
This solution is discounted in a Geth PR but is later implemented and is the current pruning solution.
Initial development work can be found here: feat(cmd): Add offline pruning of state trie
In-memory Deathlists (Substrate)
Substrate uses an online algorithm to remove trie nodes for blocks when they leave the state window.
For a state look back window of size$X$ , it maintains $X$ deathlists in-memory indexed by block hash. Each of these stores the list of hashes of nodes to be deleted when the block leaves the look back window.
The main consideration for this algorithm is that a block can be deleted (added to a deathlist) and then re-added to a trie later. When adding a new node all existing deathlists must be checked to make sure it isn't already marked for deletion.
This works as follows:
This solves the pruning issue assuming blocks cannot be applied and then rolled back. Supporting this requires an additional journaling step that allows the modifications to the deathlists to be rolled back as well.
Let$I$ and $O$ be the average number of incoming and outgoing nodes respectively. The resource requirements for this approach are:
per block.
This approach is simple to understand and works online, however, it requires modifications to the block creation cycle and adds constant memory and work overhead.
The Substrate implementation is available here
Proposed Implementation | In-memory Deathlists (Substrate)
Gossamer Online State Trie Pruning
Step 1: State Trie Diff
Get the diff(inserted and deleted keys) of state trie of current finalized block and previously finalized block, that will be used as deleted keys in
JournalRecord
Step 2:
Initialize Journal Record
Initialized journal record by using state trie diff, and store the record in DB using
pruning_journal
keyJournal Record
When block is finalized, create a journal record with the state trie diff and persist that record as the meta data for the block
Journal Key
We need to persist the
JournalRecord
to recover it in case of crash.It is persisted in DB with
journal key
as the entry key.journalKey = pruning_journal (prefix) + block number
Step 3:
Window
DeathList
is a queue ofDeathRow
.DeathRow
contains keys that got deleted for each block in the pruning window.DeathIndex
stores all the keys store in the entireDeathList
. This serves as an inverted index to check if a key is already in aDeathRow
of a block.Death Row
Initialize Death Row
DeathList
and fromDeathRow
usingDeathIndex
if that key is re-inserted into state trie of finalized block.DeathIndex
.DeathRow
and add it toDeathList
.Step 4: (Background Pruning)
Pruning
Pruner will have
maxBlock
which is the total blocks from head after which all block will be pruned.DeathRow
) fromDeathList
, remove all the deleted field keys ofDeathRow
from DB andDeathIndex
last_pruned
key prefix in DBBeta Was this translation helpful? Give feedback.
All reactions