Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP-568: Resharding V3 #568

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7ef6857
template
wacban Oct 24, 2024
55bf0b3
rename
wacban Oct 24, 2024
2ac4b05
metadata
wacban Oct 24, 2024
5ef6b9d
summary
wacban Oct 24, 2024
62f127c
specification
wacban Oct 24, 2024
bec1ae6
formatting template texts
wacban Oct 24, 2024
d72d509
fix lints
wacban Oct 28, 2024
47f22f9
Resharding V3 - added future possibilities (#569)
wacban Oct 29, 2024
8d8e761
Add flat state specs to resharding NEP (#570)
Trisfald Oct 29, 2024
1ee5a74
fix lints
wacban Oct 29, 2024
6c98441
State Storage - State (#571)
staffik Nov 6, 2024
08b1834
fix lint
staffik Nov 6, 2024
6e431ff
fix lint
staffik Nov 6, 2024
5368857
ShardId Semantics (#572)
wacban Nov 7, 2024
a7bbee7
ReshardingV3 memtrie (#574)
shreyan-gupta Nov 14, 2024
65ece27
lint
shreyan-gupta Nov 14, 2024
615a92f
Add reference implementation for Flat state to resharding NEP (#575)
Trisfald Nov 15, 2024
e488308
fix lint
Trisfald Nov 15, 2024
12fd994
motivation (#576)
wacban Nov 18, 2024
03eeeea
move some summary to motivation
wacban Nov 19, 2024
080ac3c
Resharding V3 - add a few state sync details (#573)
marcelo-gonzalez Nov 20, 2024
910cf6c
Resharding V3 - state witness, implementation (#577)
Longarithm Nov 22, 2024
8ab6927
[resharding] Add sections for receipt handling (#578)
shreyan-gupta Nov 25, 2024
09dc287
cross shard traffic
wacban Dec 4, 2024
d889993
state sync: reword to make clear it's not just for convenience (#579)
marcelo-gonzalez Dec 13, 2024
0ae4721
resharding: cold storage (#580)
staffik Dec 13, 2024
cf3174f
lint
wacban Dec 13, 2024
c4f5e30
Add section on buffered receipt handling
shreyan-gupta Dec 16, 2024
1efd6cf
[resharding] ChatGPT modulation (#581)
shreyan-gupta Dec 16, 2024
74301d0
fix(resharding): update pseudocode (#583)
staffik Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
State Storage - State (#571)
staffik authored Nov 6, 2024
commit 6c984415ce0399429ae2f6bcc516acc901ce0220
106 changes: 106 additions & 0 deletions neps/nep-0568.md
wacban marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -106,7 +106,29 @@
5) Children shards Flat States are now ready and can be used to take State Sync
snapshots and to reload Mem Tries.

### State Storage - State

Check failure on line 109 in neps/nep-0568.md

GitHub Actions / markdown-lint

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### State Storage - State"]

neps/nep-0568.md:109 MD022/blanks-around-headings/blanks-around-headers Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### State Storage - State"]
// TODO Describe integration with cold storage once design is ready

Each shard’s Trie is stored in the `State` column of the database, with keys prefixed by `ShardUId`, followed by a node's hash.
This structure uniquely identifies each shard’s data. To avoid copying all entries under a new `ShardUId` during resharding,
a mapping strategy allows child shards to access ancestor shard data without directly creating new entries.

A naive approach to resharding would involve copying all `State` entries with a new `ShardUId` for a child shard, effectively duplicating the state.
This method, while straightforward, is not feasible because copying a large state would take too much time.
Resharding needs to appear complete between two blocks, so a direct copy would not allow the process to occur quickly enough.

To address this, Resharding V3 employs an efficient mapping strategy, using the `DBCol::ShardUIdMapping` column
to link each child shard’s `ShardUId` to the closest ancestor’s `ShardUId` holding the relevant data.
This allows child shards to access and update state data under the ancestor shard’s prefix without duplicating entries.

Initially, `ShardUIdMapping` is empty, as existing shards map to themselves. During resharding, a mapping entry is added to `ShardUIdMapping`,
pointing each child shard’s `ShardUId` to the appropriate ancestor. Mappings persist as long as any descendant shard references the ancestor’s data.
Once a node stops tracking all children and descendants of a shard, the entry for that shard can be removed, allowing its data to be garbage collected.
For archival nodes, mappings are retained indefinitely to maintain access to the full historical state.

This mapping strategy enables efficient shard management during resharding events,
supporting smooth transitions without altering storage structures directly.


### Stateless Validation

@@ -134,6 +156,90 @@
The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.]
```

### State Storage - State mapping

To enable efficient shard state management during resharding, Resharding V3 uses the `DBCol::ShardUIdMapping` column.
This mapping allows child shards to reference ancestor shard data, avoiding the need for immediate duplication of state entries.

#### Mapping application in adapters

The core of the mapping logic is applied in `TrieStoreAdapter` and `TrieStoreUpdateAdapter`, which act as layers over the general `Store` interface.
Here’s a breakdown of the key functions involved:

- **Key resolution**:

Check failure on line 169 in neps/nep-0568.md

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:169:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
The `get_key_from_shard_uid_and_hash` function is central to determining the correct `ShardUId` for state access.
At a high level, operations use the child shard's `ShardUId`, but within this function,
the `DBCol::ShardUIdMapping` column is checked to determine if an ancestor `ShardUId` should be used instead.

```rust
fn get_key_from_shard_uid_and_hash(
store: &Store,
shard_uid: ShardUId,
hash: &CryptoHash,
) -> [u8; 40] {
let mapped_shard_uid = store
.get_ser::<ShardUId>(DBCol::StateShardUIdMapping, &shard_uid.to_bytes())
.expect("get_key_from_shard_uid_and_hash() failed")
.unwrap_or(shard_uid);
let mut key = [0; 40];
key[0..8].copy_from_slice(&mapped_shard_uid.to_bytes());
key[8..].copy_from_slice(hash.as_ref());
key
}
```

This function first attempts to retrieve a mapped ancestor `ShardUId` from `DBCol::ShardUIdMapping`.
If no mapping exists, it defaults to the provided child `ShardUId`.
This resolved `ShardUId` is then combined with the `node_hash` to form the final key used in `State` column operations.

- **State access operations**:

Check failure on line 195 in neps/nep-0568.md

GitHub Actions / markdown-lint

Unordered list style [Expected: asterisk; Actual: dash]

neps/nep-0568.md:195:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash]
The `TrieStoreAdapter` and `TrieStoreUpdateAdapter` use `get_key_from_shard_uid_and_hash` to correctly resolve the key for both reads and writes.
Example methods include:

```rust
// In TrieStoreAdapter
pub fn get(&self, shard_uid: ShardUId, hash: &CryptoHash) -> Result<Arc<[u8]>, StorageError> {
let key = get_key_from_shard_uid_and_hash(self.store, shard_uid, hash);
self.store.get(DBCol::State, &key)
}

// In TrieStoreUpdateAdapter
pub fn increment_refcount_by(
&mut self,
shard_uid: ShardUId,
hash: &CryptoHash,
data: &[u8],
increment: NonZero<u32>,
) {
let key = get_key_from_shard_uid_and_hash(self.store, shard_uid, hash);
self.store_update.increment_refcount_by(DBCol::State, key.as_ref(), data, increment);
}
```

Check failure on line 217 in neps/nep-0568.md

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines [Context: "```"]

neps/nep-0568.md:217 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]
The `get` function retrieves data using the resolved `ShardUId` and key, while `increment_refcount_by` manages reference counts,
ensuring correct tracking even when accessing data under an ancestor shard.

#### Mapping retention and cleanup

Mappings in `DBCol::ShardUIdMapping` persist as long as any descendant relies on an ancestor’s data.
To manage this, the `set_shard_uid_mapping` function in `TrieStoreUpdateAdapter` adds a new mapping during resharding:
```rust

Check failure on line 225 in neps/nep-0568.md

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines [Context: "```rust"]

neps/nep-0568.md:225 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```rust"]
birchmd marked this conversation as resolved.
Show resolved Hide resolved
fn set_shard_uid_mapping(&mut self, child_shard_uid: ShardUId, parent_shard_uid: ShardUId) {
self.store_update.set(
DBCol::StateShardUIdMapping,
child_shard_uid.to_bytes().as_ref(),
&borsh::to_vec(&parent_shard_uid).expect("Borsh serialize cannot fail"),
)
}
```

When a node stops tracking all descendants of a shard, the associated mapping entry can be removed, allowing RocksDB to perform garbage collection.
For archival nodes, mappings are retained permanently to ensure access to the historical state of all shards.

This implementation ensures efficient and scalable shard state transitions,
allowing child shards to use ancestor data without creating redundant entries.



## Security Implications

```text