From ea63dbf806e85e525c92324dda8bd02d9e0da423 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:25:12 -0400 Subject: [PATCH 01/21] Fix binary Merkle tree. --- specs/data_structures.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 38d0924..00d7014 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -228,20 +228,20 @@ Merkle trees are used to authenticate various pieces of data across the LazyLedg ## Binary Merkle Tree -Binary Merkle trees are constructed in the usual fashion, with leaves being hashed once to get leaf node values and internal node values being the hash of the concatenation of their children. The specific mechanism for hashing leaves for leaf nodes and children for internal nodes may be different (see: [annotated Merkle trees](#annotated-merkle-tree)), but for plain binary Merkle trees are the same. +Binary Merkle trees are constructed in the same fashion as described in [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). Leaves are hashed once to get leaf node values and internal node values are the hash of the concatenation of their children (either leaf nodes or other internal nodes). -For leaf node of leaf message `m`, its value `v` is: +For leaf node of leaf data `d`, its value `v` is: ```C++ -v = h(serialize(m)) +v = h(0x00, serialize(d)) ``` -An exception is made, in the case of empty leaves: the value of a leaf node with an empty leaf is 32-byte zero, i.e. `0x0000000000000000000000000000000000000000000000000000000000000000`. This is used rather than duplicating the last node if there are an odd number of nodes (the [Bitcoin design](https://github.com/bitcoin/bitcoin/blob/5961b23898ee7c0af2626c46d5d70e80136578d3/src/consensus/merkle.cpp#L9-L43)) to avoid the complexities in that design, which resulted in e.g. [CVE-2012-2459](https://nvd.nist.gov/vuln/detail/CVE-2012-2459). By constructions, trees are implicitly padded with empty leaves up to the smallest enclosing power of 2. - For internal node with children `l` and `r`, its value `v` is: ```C++ -v = h(l.v, r.v) +v = h(0x01, l.v, r.v) ``` +Note that rather than duplicating the last node if there are an odd number of nodes (the [Bitcoin design](https://github.com/bitcoin/bitcoin/blob/5961b23898ee7c0af2626c46d5d70e80136578d3/src/consensus/merkle.cpp#L9-L43)), trees are allowed to be imbalanced. In other words, the height of each leaf may be different. For an example, see Section 2.1.3. of [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). + ## Annotated Merkle Tree Merkle trees can be augmented as generic annotated Merkle trees, where additional fields can be contained in each node. One of the early annotated Merkle trees is the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0), which allows for compact fraud proofs to be made of fees collected in a block. From a3ec7c0a347a0c9fd7d70c1ac4ac027f9bcb2b06 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:30:02 -0400 Subject: [PATCH 02/21] Add base case for binary Merkle tree. --- specs/data_structures.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/specs/data_structures.md b/specs/data_structures.md index 00d7014..3019068 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -230,6 +230,11 @@ Merkle trees are used to authenticate various pieces of data across the LazyLedg Binary Merkle trees are constructed in the same fashion as described in [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). Leaves are hashed once to get leaf node values and internal node values are the hash of the concatenation of their children (either leaf nodes or other internal nodes). +The base case (an empty tree) is defined as zero: +```C++ +v = 0x0000000000000000000000000000000000000000000000000000000000000000 +``` + For leaf node of leaf data `d`, its value `v` is: ```C++ v = h(0x00, serialize(d)) From a72a4a3f1494608f62428e22e2da0e96bcbc6569 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:38:09 -0400 Subject: [PATCH 03/21] Remove annotated Merkle tree abstraction and clean up NMT. --- specs/data_structures.md | 54 ++++++++-------------------------------- 1 file changed, 10 insertions(+), 44 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 3019068..4595995 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -21,8 +21,6 @@ Data Structures - [Public-Key Cryptography](#public-key-cryptography) - [Merkle Trees](#merkle-trees) - [Binary Merkle Tree](#binary-merkle-tree) - - [Annotated Merkle Tree](#annotated-merkle-tree) - - [Verifying Annotated Merkle Proofs](#verifying-annotated-merkle-proofs) - [Namespace Merkle Tree](#namespace-merkle-tree) - [Sparse Merkle Tree](#sparse-merkle-tree) - [Erasure Coding](#erasure-coding) @@ -247,58 +245,26 @@ v = h(0x01, l.v, r.v) Note that rather than duplicating the last node if there are an odd number of nodes (the [Bitcoin design](https://github.com/bitcoin/bitcoin/blob/5961b23898ee7c0af2626c46d5d70e80136578d3/src/consensus/merkle.cpp#L9-L43)), trees are allowed to be imbalanced. In other words, the height of each leaf may be different. For an example, see Section 2.1.3. of [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). -## Annotated Merkle Tree - -Merkle trees can be augmented as generic annotated Merkle trees, where additional fields can be contained in each node. One of the early annotated Merkle trees is the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0), which allows for compact fraud proofs to be made of fees collected in a block. - -Annotated Merkle trees have extra fields and methods to compute values for those fields, i.e. `f_1, ..., f_n, v` for `n` fields (note that if `n=0`, the annotated Merkle tree is a plain [binary Merkle tree](#binary-merkle-tree)). The value of field `f_i` is computed with the method `m_i_i(height, left_child_field, right_child_field)` for internal nodes and `m_i_l(message)` for leaf nodes. - -For leaf node of leaf message `m`, its value `v` and fields `f_1, ..., f_n` are: -```C++ -f_1 = m_1_l(m) -... -f_n = m_n_l(m) -v = h(serialize(m)) -``` - -For internal node at height `height` with children `l` and `r`, its value `v` and fields `f_1, ..., f_n` are: -```C++ -f_1 = m_1_i(height, l.f_1, r.f_1) -... -f_n = m_n_i(height, l.f_n, r.f_n) -v = h(l.f_1, ..., l.f_n, l.v, r.f_1, ..., r.f_n, r.v) -``` - -If a compact Merkle root is needed, the root level (which consists of root fields and a root value) can be hashed once. - -As an example of annotation, when hashing leaves, `0x00` can be prepended, and when hashing internal nodes, `0x01` can be prepended (i.e. `m_1_l() = 0x00` and `m_1_i() = 0x01`). This avoids a second-preimage attack [where internal nodes are presented as leaves](https://en.wikipedia.org/wiki/Merkle_tree#Second_preimage_attack) for incomplete trees. - -### Verifying Annotated Merkle Proofs - -In addition to the root, leaf, index, and sibling values of a Merkle proof for a plain [binary Merkle tree](#binary-merkle-tree), Merkle proofs for annotated Mekle trees have the sibling field values. Proofs are verified by using the appropriate methods to compute field values. +Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepended for leaf nodes while `0x01` is prepended for internal nodes. This avoids a second-preimage attack [where internal nodes are presented as leaves](https://en.wikipedia.org/wiki/Merkle_tree#Second_preimage_attack) trees with leaves at different heights. ## Namespace Merkle Tree -[Messages](#message) in LazyLedger are associated with a provided _namespace ID_, which identifies the application (or applications) that will read these messages when parsing blocks. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642). - -The NMT is an annotated Merkle tree with two additional fields and methods that indicate the range of namespace IDs in each node's subtree. +[Shares](#share) in LazyLedger are associated with a provided _namespace ID_. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642), which is itself an extension of the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0). It allows for compact proofs around the inclusion of exclusion of shares with particular namespace IDs. -For leaf node of message `m`: +For leaf node of data `d`: ```C++ -n_min = m_1_l(m) = m.namespaceID -n_max = m_2_l(m) = m.namespaceID -v = h(serialize(m)) +n_min = d.namespaceID +n_max = d.namespaceID +v = h(0x00, serialize(m)) ``` -The `namespaceID` message field here is the namespace ID of the message, which is a [`NAMESPACE_ID_BYTES`](consensus.md#system-parameters)-long byte array. - -Before being hashed, the [messages](#message) are [serialized](#serialization). +The `namespaceID` message field here is the namespace ID of the leaf, which is a [`NAMESPACE_ID_BYTES`](consensus.md#system-parameters)-long byte array. For internal node with children `l` and `r`: ```C++ -n_min = m_1_i(height, l, r) = min(l.n_min, r.n_min) -n_max = m_2_i(height, l, r) = max(l.n_max, r.n_max) -v = h(l, r) = h(l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) +n_min = min(l.n_min, r.n_min) +n_max = max(l.n_max, r.n_max) +v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) ``` ## Sparse Merkle Tree From 05d1252739da5b5b762fce8a5dfdf08dcaaa2145 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:39:32 -0400 Subject: [PATCH 04/21] Clean up SMT a bit. --- specs/data_structures.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 4595995..f7c0b43 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -279,18 +279,16 @@ For a Merkle branch of height `h`, an `h`-bit value is appended to the proof. Th Finally, the number of hashing operations can be reduced to be logarithmic in the number of non-empty leaves on average. An internal node that is the root of a subtree that contains exactly one non-empty leaf is replaced by that leaf's leaf node. -This creates an imbalanced tree with leaf nodes at different heights, so leaves and nodes must be hashed differently to avoid a second-preimage attack [where internal nodes are presented as leaf nodes](https://en.wikipedia.org/wiki/Merkle_tree#Second_preimage_attack). When hashing leaves, the `uint8` value `0x00` is prepended to the leaf value, and when hashing nodes, `0x01` is prepended to the hash value. - Additionally, the key of leaf nodes must be prepended, since the index of a leaf node that is not at the base of the tree cannot be determined without this information. -For leaf node of leaf message `m` with key `k`, its value `v` is: +For leaf node of leaf data `d` with key `k`, its value `v` is: ```C++ -v = h(`0x00`, k, serialize(m)) +v = h(0x00, k, serialize(d)) ``` For internal node with children `l` and `r`, its value `v` is: ```C++ -v = h(`0x01`, l.v, r.v) +v = h(0x01, l.v, r.v) ``` A proof into an SMT is structured as: From 3c56a295a6189855bf35bee8c30e9fb934f0038c Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:44:36 -0400 Subject: [PATCH 05/21] Add base case for NMT. --- specs/data_structures.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/specs/data_structures.md b/specs/data_structures.md index f7c0b43..22bc82a 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -251,6 +251,13 @@ Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepend [Shares](#share) in LazyLedger are associated with a provided _namespace ID_. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642), which is itself an extension of the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0). It allows for compact proofs around the inclusion of exclusion of shares with particular namespace IDs. +The base case (an empty tree) is defined as: +```C++ +n_min = 0x0000000000000000000000000000000000000000000000000000000000000000 +n_max = 0x0000000000000000000000000000000000000000000000000000000000000000 +v = 0x0000000000000000000000000000000000000000000000000000000000000000 +``` + For leaf node of data `d`: ```C++ n_min = d.namespaceID From 6e3635d08549929225c963548a22bf4b82e14317 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:45:34 -0400 Subject: [PATCH 06/21] Add base case for SMT. --- specs/data_structures.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/specs/data_structures.md b/specs/data_structures.md index 22bc82a..39bc4aa 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -288,6 +288,11 @@ Finally, the number of hashing operations can be reduced to be logarithmic in th Additionally, the key of leaf nodes must be prepended, since the index of a leaf node that is not at the base of the tree cannot be determined without this information. +The base case (an empty tree) is defined as: +```C++ +v = 0x0000000000000000000000000000000000000000000000000000000000000000 +``` + For leaf node of leaf data `d` with key `k`, its value `v` is: ```C++ v = h(0x00, k, serialize(d)) From 5ee2f9718fbd9e9e0b40ebf619b3c03490fe281e Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:51:24 -0400 Subject: [PATCH 07/21] Refactor and clean up SMT. --- specs/data_structures.md | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 39bc4aa..e31f661 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -23,6 +23,7 @@ Data Structures - [Binary Merkle Tree](#binary-merkle-tree) - [Namespace Merkle Tree](#namespace-merkle-tree) - [Sparse Merkle Tree](#sparse-merkle-tree) + - [Sparse Merkle Tree Proofs](#sparse-merkle-tree-proofs) - [Erasure Coding](#erasure-coding) - [Reed-Solomon Erasure Coding](#reed-solomon-erasure-coding) - [2D Reed-Solomon Encoding Scheme](#2d-reed-solomon-encoding-scheme) @@ -278,17 +279,12 @@ v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) Sparse Merkle Trees (SMTs) are _sparse_, i.e. they contain mostly empty leaves. They can be used as key-value stores for arbitrary data, as each leaf is keyed by its index in the tree. Storage efficiency is achieved through clever use of implicit defaults, avoiding the need to store empty leaves. -Default values are given to leaf nodes with empty leaves. While this is sufficient to pre-compute the values of intermediate nodes that are roots of empty subtrees, a further simplification is to extend this default value to all nodes that are roots of empty subtrees. The 32-byte zero, i.e. `0x0000000000000000000000000000000000000000000000000000000000000000`, is used as the default value. +Additional rules are added on top of plain [binary Merkle trees](#binary-merkle-tree): +1. Default values are given to leaf nodes with empty leaves. +1. While the above rule is sufficient to pre-compute the values of intermediate nodes that are roots of empty subtrees, a further simplification is to extend this default value to all nodes that are roots of empty subtrees. The 32-byte zero, i.e. `0x0000000000000000000000000000000000000000000000000000000000000000`, is used as the default value. This rule takes precedence over the above one. +1. The number of hashing operations can be reduced to be logarithmic in the number of non-empty leaves on average, assuming a uniform distribution of non-empty leaf keys. An internal node that is the root of a subtree that contains exactly one non-empty leaf is replaced by that leaf's leaf node. -SMTs can further be extended with _compact_ proofs. [Merkle proofs](#verifying-annotated-merkle-proofs) are composed, among other things, of a list of sibling node values. We note that, since nodes that are roots of empty subtrees have known values (the default value), these values do not need to be provided explicitly; it is sufficient to simply identify which siblings in the Merkle branch are roots of empty subtrees, which can be done with one bit per sibling. - -For a Merkle branch of height `h`, an `h`-bit value is appended to the proof. The lowest bit corresponds to the sibling of the leaf node, and each higher bit corresponds to the next parent. A value of `1` indicates that the next value in the list of values provided explicitly in the proof should be used, and a value of `0` indicates that the default value should be used. - -Finally, the number of hashing operations can be reduced to be logarithmic in the number of non-empty leaves on average. An internal node that is the root of a subtree that contains exactly one non-empty leaf is replaced by that leaf's leaf node. - -Additionally, the key of leaf nodes must be prepended, since the index of a leaf node that is not at the base of the tree cannot be determined without this information. - -The base case (an empty tree) is defined as: +The base case (an empty tree) is defined as the default value: ```C++ v = 0x0000000000000000000000000000000000000000000000000000000000000000 ``` @@ -298,11 +294,19 @@ For leaf node of leaf data `d` with key `k`, its value `v` is: v = h(0x00, k, serialize(d)) ``` +The key of leaf nodes must be prepended, since the index of a leaf node that is not at the base of the tree cannot be determined without this information. + For internal node with children `l` and `r`, its value `v` is: ```C++ v = h(0x01, l.v, r.v) ``` +### Sparse Merkle Tree Proofs + +SMTs can further be extended with _compact_ proofs. [Merkle proofs](#verifying-annotated-merkle-proofs) are composed, among other things, of a list of sibling node values. We note that, since nodes that are roots of empty subtrees have known values (the default value), these values do not need to be provided explicitly; it is sufficient to simply identify which siblings in the Merkle branch are roots of empty subtrees, which can be done with one bit per sibling. + +For a Merkle branch of height `h`, an `h`-bit value is appended to the proof. The lowest bit corresponds to the sibling of the leaf node, and each higher bit corresponds to the next parent. A value of `1` indicates that the next value in the list of values provided explicitly in the proof should be used, and a value of `0` indicates that the default value should be used. + A proof into an SMT is structured as: | name | type | description | From e0900b1165077e53bd7e284f0e69d447417e131e Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:53:31 -0400 Subject: [PATCH 08/21] Add merkle proofs for BMT and NMT. --- specs/data_structures.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/specs/data_structures.md b/specs/data_structures.md index e31f661..cec48e8 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -21,7 +21,9 @@ Data Structures - [Public-Key Cryptography](#public-key-cryptography) - [Merkle Trees](#merkle-trees) - [Binary Merkle Tree](#binary-merkle-tree) + - [Binary Merkle Tree Proofs](#binary-merkle-tree-proofs) - [Namespace Merkle Tree](#namespace-merkle-tree) + - [Namespace Merkle Tree Proofs](#namespace-merkle-tree-proofs) - [Sparse Merkle Tree](#sparse-merkle-tree) - [Sparse Merkle Tree Proofs](#sparse-merkle-tree-proofs) - [Erasure Coding](#erasure-coding) @@ -248,6 +250,16 @@ Note that rather than duplicating the last node if there are an odd number of no Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepended for leaf nodes while `0x01` is prepended for internal nodes. This avoids a second-preimage attack [where internal nodes are presented as leaves](https://en.wikipedia.org/wiki/Merkle_tree#Second_preimage_attack) trees with leaves at different heights. +### Binary Merkle Tree Proofs + +| name | type | description | +| ---------- | ----------------------------- | ------------------------------------------------------------------------ | +| `root` | [HashDigest](#hashdigest) | Merkle root. | +| `key` | `byte[32]` | Key (i.e. index) of the leaf. | +| `depth` | `uint16` | Depth of the leaf node. The root node is at depth `0`. Must be `<= 256`. | +| `siblings` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | +| `leaf` | `byte[]` | Leaf value. | + ## Namespace Merkle Tree [Shares](#share) in LazyLedger are associated with a provided _namespace ID_. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642), which is itself an extension of the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0). It allows for compact proofs around the inclusion of exclusion of shares with particular namespace IDs. @@ -275,6 +287,18 @@ n_max = max(l.n_max, r.n_max) v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) ``` +### Namespace Merkle Tree Proofs + +| name | type | description | +| --------------- | -------------------------------- | ------------------------------------------------------------------------ | +| `root` | [HashDigest](#hashdigest) | Merkle root. | +| `key` | `byte[32]` | Key (i.e. index) of the leaf. | +| `depth` | `uint16` | Depth of the leaf node. The root node is at depth `0`. Must be `<= 256`. | +| `siblingValues` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | +| `siblingMins` | [NamespaceID](#type-aliases)`[]` | Sibling min namespaceIDs. | +| `siblingMaxs` | [NamespaceID](#type-aliases)`[]` | Sibling max namespaceIDs. | +| `leaf` | `byte[]` | Leaf value. | + ## Sparse Merkle Tree Sparse Merkle Trees (SMTs) are _sparse_, i.e. they contain mostly empty leaves. They can be used as key-value stores for arbitrary data, as each leaf is keyed by its index in the tree. Storage efficiency is achieved through clever use of implicit defaults, avoiding the need to store empty leaves. From 8adacbd03e5ba595b6a953364c5265cd7ab000e0 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:57:00 -0400 Subject: [PATCH 09/21] Clean up. --- specs/data_structures.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index cec48e8..3fd5810 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -295,8 +295,8 @@ v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) | `key` | `byte[32]` | Key (i.e. index) of the leaf. | | `depth` | `uint16` | Depth of the leaf node. The root node is at depth `0`. Must be `<= 256`. | | `siblingValues` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | -| `siblingMins` | [NamespaceID](#type-aliases)`[]` | Sibling min namespaceIDs. | -| `siblingMaxs` | [NamespaceID](#type-aliases)`[]` | Sibling max namespaceIDs. | +| `siblingMins` | [NamespaceID](#type-aliases)`[]` | Sibling min namespace IDs. | +| `siblingMaxes` | [NamespaceID](#type-aliases)`[]` | Sibling max namespace IDs. | | `leaf` | `byte[]` | Leaf value. | ## Sparse Merkle Tree From 22cbaaa10b3ad23fe43bbe5c0b6d42764e588c26 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 10:58:56 -0400 Subject: [PATCH 10/21] Remove unneeded depth field from BMT proof. --- specs/data_structures.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 3fd5810..c634fd2 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -252,13 +252,12 @@ Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepend ### Binary Merkle Tree Proofs -| name | type | description | -| ---------- | ----------------------------- | ------------------------------------------------------------------------ | -| `root` | [HashDigest](#hashdigest) | Merkle root. | -| `key` | `byte[32]` | Key (i.e. index) of the leaf. | -| `depth` | `uint16` | Depth of the leaf node. The root node is at depth `0`. Must be `<= 256`. | -| `siblings` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | -| `leaf` | `byte[]` | Leaf value. | +| name | type | description | +| ---------- | ----------------------------- | ----------------------------- | +| `root` | [HashDigest](#hashdigest) | Merkle root. | +| `key` | `byte[32]` | Key (i.e. index) of the leaf. | +| `siblings` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | +| `leaf` | `byte[]` | Leaf value. | ## Namespace Merkle Tree From eeca2f6915d53af553de6f46e64c63e8054bdcc9 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:12:35 -0400 Subject: [PATCH 11/21] Fix typo. --- specs/data_structures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index c634fd2..bfa057b 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -246,7 +246,7 @@ For internal node with children `l` and `r`, its value `v` is: v = h(0x01, l.v, r.v) ``` -Note that rather than duplicating the last node if there are an odd number of nodes (the [Bitcoin design](https://github.com/bitcoin/bitcoin/blob/5961b23898ee7c0af2626c46d5d70e80136578d3/src/consensus/merkle.cpp#L9-L43)), trees are allowed to be imbalanced. In other words, the height of each leaf may be different. For an example, see Section 2.1.3. of [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). +Note that rather than duplicating the last node if there are an odd number of nodes (the [Bitcoin design](https://github.com/bitcoin/bitcoin/blob/5961b23898ee7c0af2626c46d5d70e80136578d3/src/consensus/merkle.cpp#L9-L43)), trees are allowed to be imbalanced. In other words, the height of each leaf may be different. For an example, see Section 2.1.3 of [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepended for leaf nodes while `0x01` is prepended for internal nodes. This avoids a second-preimage attack [where internal nodes are presented as leaves](https://en.wikipedia.org/wiki/Merkle_tree#Second_preimage_attack) trees with leaves at different heights. From 418585201e1b92c0d840c4d06b1af6e11ef9ef1e Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:14:28 -0400 Subject: [PATCH 12/21] Remove unneeded depth field from NMT. --- specs/data_structures.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index bfa057b..19a01eb 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -288,15 +288,14 @@ v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) ### Namespace Merkle Tree Proofs -| name | type | description | -| --------------- | -------------------------------- | ------------------------------------------------------------------------ | -| `root` | [HashDigest](#hashdigest) | Merkle root. | -| `key` | `byte[32]` | Key (i.e. index) of the leaf. | -| `depth` | `uint16` | Depth of the leaf node. The root node is at depth `0`. Must be `<= 256`. | -| `siblingValues` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | -| `siblingMins` | [NamespaceID](#type-aliases)`[]` | Sibling min namespace IDs. | -| `siblingMaxes` | [NamespaceID](#type-aliases)`[]` | Sibling max namespace IDs. | -| `leaf` | `byte[]` | Leaf value. | +| name | type | description | +| --------------- | -------------------------------- | ----------------------------- | +| `root` | [HashDigest](#hashdigest) | Merkle root. | +| `key` | `byte[32]` | Key (i.e. index) of the leaf. | +| `siblingValues` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | +| `siblingMins` | [NamespaceID](#type-aliases)`[]` | Sibling min namespace IDs. | +| `siblingMaxes` | [NamespaceID](#type-aliases)`[]` | Sibling max namespace IDs. | +| `leaf` | `byte[]` | Leaf value. | ## Sparse Merkle Tree From cb65f75751cd9165213fa1aa571cff524e1f7498 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:14:41 -0400 Subject: [PATCH 13/21] Fix typo. --- specs/data_structures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 19a01eb..e7e1e69 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -261,7 +261,7 @@ Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepend ## Namespace Merkle Tree -[Shares](#share) in LazyLedger are associated with a provided _namespace ID_. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642), which is itself an extension of the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0). It allows for compact proofs around the inclusion of exclusion of shares with particular namespace IDs. +[Shares](#share) in LazyLedger are associated with a provided _namespace ID_. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642), which is itself an extension of the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0). It allows for compact proofs around the inclusion or exclusion of shares with particular namespace IDs. The base case (an empty tree) is defined as: ```C++ From dcb31597ff41a2a5a5a99747c627eba728601d70 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:31:35 -0400 Subject: [PATCH 14/21] Update specs/data_structures.md Co-authored-by: Ismail Khoffi --- specs/data_structures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index e7e1e69..1787e1c 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -274,7 +274,7 @@ For leaf node of data `d`: ```C++ n_min = d.namespaceID n_max = d.namespaceID -v = h(0x00, serialize(m)) +v = h(0x00, serialize(d) ``` The `namespaceID` message field here is the namespace ID of the leaf, which is a [`NAMESPACE_ID_BYTES`](consensus.md#system-parameters)-long byte array. From e3854c3ca5eccdefb92c09bae480210f0975a95f Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:42:18 -0400 Subject: [PATCH 15/21] Clarify that nodes have fields. --- specs/data_structures.md | 42 ++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 1787e1c..ae6ec83 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -233,17 +233,17 @@ Binary Merkle trees are constructed in the same fashion as described in [Certifi The base case (an empty tree) is defined as zero: ```C++ -v = 0x0000000000000000000000000000000000000000000000000000000000000000 +node.v = 0x0000000000000000000000000000000000000000000000000000000000000000 ``` -For leaf node of leaf data `d`, its value `v` is: +For leaf node `node` of leaf data `d`, its value `v` is: ```C++ -v = h(0x00, serialize(d)) +node.v = h(0x00, serialize(d)) ``` -For internal node with children `l` and `r`, its value `v` is: +For internal node `node` with children `l` and `r`, its value `v` is: ```C++ -v = h(0x01, l.v, r.v) +node.v = h(0x01, l.v, r.v) ``` Note that rather than duplicating the last node if there are an odd number of nodes (the [Bitcoin design](https://github.com/bitcoin/bitcoin/blob/5961b23898ee7c0af2626c46d5d70e80136578d3/src/consensus/merkle.cpp#L9-L43)), trees are allowed to be imbalanced. In other words, the height of each leaf may be different. For an example, see Section 2.1.3 of [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). @@ -265,25 +265,25 @@ Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepend The base case (an empty tree) is defined as: ```C++ -n_min = 0x0000000000000000000000000000000000000000000000000000000000000000 -n_max = 0x0000000000000000000000000000000000000000000000000000000000000000 -v = 0x0000000000000000000000000000000000000000000000000000000000000000 +node.n_min = 0x0000000000000000000000000000000000000000000000000000000000000000 +node.n_max = 0x0000000000000000000000000000000000000000000000000000000000000000 +node.v = 0x0000000000000000000000000000000000000000000000000000000000000000 ``` -For leaf node of data `d`: +For leaf node `node` of data `d`: ```C++ -n_min = d.namespaceID -n_max = d.namespaceID -v = h(0x00, serialize(d) +node.n_min = d.namespaceID +node.n_max = d.namespaceID +node.v = h(0x00, serialize(d) ``` The `namespaceID` message field here is the namespace ID of the leaf, which is a [`NAMESPACE_ID_BYTES`](consensus.md#system-parameters)-long byte array. -For internal node with children `l` and `r`: +For internal node `node` with children `l` and `r`: ```C++ -n_min = min(l.n_min, r.n_min) -n_max = max(l.n_max, r.n_max) -v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) +node.n_min = min(l.n_min, r.n_min) +node.n_max = max(l.n_max, r.n_max) +node.v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) ``` ### Namespace Merkle Tree Proofs @@ -308,19 +308,19 @@ Additional rules are added on top of plain [binary Merkle trees](#binary-merkle- The base case (an empty tree) is defined as the default value: ```C++ -v = 0x0000000000000000000000000000000000000000000000000000000000000000 +node.v = 0x0000000000000000000000000000000000000000000000000000000000000000 ``` -For leaf node of leaf data `d` with key `k`, its value `v` is: +For leaf node `node` of leaf data `d` with key `k`, its value `v` is: ```C++ -v = h(0x00, k, serialize(d)) +node.v = h(0x00, k, serialize(d)) ``` The key of leaf nodes must be prepended, since the index of a leaf node that is not at the base of the tree cannot be determined without this information. -For internal node with children `l` and `r`, its value `v` is: +For internal node `node` with children `l` and `r`, its value `v` is: ```C++ -v = h(0x01, l.v, r.v) +node.v = h(0x01, l.v, r.v) ``` ### Sparse Merkle Tree Proofs From 73378ae18d8ee1ee4f2c1d6c6cb9a805c2ce4194 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:44:26 -0400 Subject: [PATCH 16/21] Add node data structures. --- specs/data_structures.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/specs/data_structures.md b/specs/data_structures.md index ae6ec83..e2cab81 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -231,6 +231,11 @@ Merkle trees are used to authenticate various pieces of data across the LazyLedg Binary Merkle trees are constructed in the same fashion as described in [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). Leaves are hashed once to get leaf node values and internal node values are the hash of the concatenation of their children (either leaf nodes or other internal nodes). +Nodes contain a single field: +| name | type | description | +| ---- | ---------- | ----------- | +| `v` | `byte[32]` | Node value. | + The base case (an empty tree) is defined as zero: ```C++ node.v = 0x0000000000000000000000000000000000000000000000000000000000000000 @@ -263,6 +268,13 @@ Leaves and internal nodes are hashed differently: the one-byte `0x00` is prepend [Shares](#share) in LazyLedger are associated with a provided _namespace ID_. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642), which is itself an extension of the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0). It allows for compact proofs around the inclusion or exclusion of shares with particular namespace IDs. +Nodes contain three fields: +| name | type | description | +| ------- | ---------------------------- | ------------------------------------------------ | +| `n_min` | [NamespaceID](#type-aliases) | Min namespace ID in subtree rooted at this node. | +| `n_max` | [NamespaceID](#type-aliases) | Max namespace ID in subtree rooted at this node. | +| `v` | `byte[32]` | Node value. | + The base case (an empty tree) is defined as: ```C++ node.n_min = 0x0000000000000000000000000000000000000000000000000000000000000000 @@ -306,6 +318,11 @@ Additional rules are added on top of plain [binary Merkle trees](#binary-merkle- 1. While the above rule is sufficient to pre-compute the values of intermediate nodes that are roots of empty subtrees, a further simplification is to extend this default value to all nodes that are roots of empty subtrees. The 32-byte zero, i.e. `0x0000000000000000000000000000000000000000000000000000000000000000`, is used as the default value. This rule takes precedence over the above one. 1. The number of hashing operations can be reduced to be logarithmic in the number of non-empty leaves on average, assuming a uniform distribution of non-empty leaf keys. An internal node that is the root of a subtree that contains exactly one non-empty leaf is replaced by that leaf's leaf node. +Nodes contain a single field: +| name | type | description | +| ---- | ---------- | ----------- | +| `v` | `byte[32]` | Node value. | + The base case (an empty tree) is defined as the default value: ```C++ node.v = 0x0000000000000000000000000000000000000000000000000000000000000000 From 0f1f2386770bb21875115f1b3fd29ced7c9a6860 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:44:44 -0400 Subject: [PATCH 17/21] Fix typo. --- specs/data_structures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index e2cab81..cfdd048 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -286,7 +286,7 @@ For leaf node `node` of data `d`: ```C++ node.n_min = d.namespaceID node.n_max = d.namespaceID -node.v = h(0x00, serialize(d) +node.v = h(0x00, serialize(d)) ``` The `namespaceID` message field here is the namespace ID of the leaf, which is a [`NAMESPACE_ID_BYTES`](consensus.md#system-parameters)-long byte array. From 16e468da5c7f27c94231c57bbe8454f54719317c Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:45:54 -0400 Subject: [PATCH 18/21] Fix type of node value. --- specs/data_structures.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index cfdd048..d7f0800 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -232,9 +232,9 @@ Merkle trees are used to authenticate various pieces of data across the LazyLedg Binary Merkle trees are constructed in the same fashion as described in [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). Leaves are hashed once to get leaf node values and internal node values are the hash of the concatenation of their children (either leaf nodes or other internal nodes). Nodes contain a single field: -| name | type | description | -| ---- | ---------- | ----------- | -| `v` | `byte[32]` | Node value. | +| name | type | description | +| ---- | ------------------------- | ----------- | +| `v` | [HashDigest](#hashdigest) | Node value. | The base case (an empty tree) is defined as zero: ```C++ @@ -273,7 +273,7 @@ Nodes contain three fields: | ------- | ---------------------------- | ------------------------------------------------ | | `n_min` | [NamespaceID](#type-aliases) | Min namespace ID in subtree rooted at this node. | | `n_max` | [NamespaceID](#type-aliases) | Max namespace ID in subtree rooted at this node. | -| `v` | `byte[32]` | Node value. | +| `v` | [HashDigest](#hashdigest) | Node value. | The base case (an empty tree) is defined as: ```C++ @@ -319,9 +319,9 @@ Additional rules are added on top of plain [binary Merkle trees](#binary-merkle- 1. The number of hashing operations can be reduced to be logarithmic in the number of non-empty leaves on average, assuming a uniform distribution of non-empty leaf keys. An internal node that is the root of a subtree that contains exactly one non-empty leaf is replaced by that leaf's leaf node. Nodes contain a single field: -| name | type | description | -| ---- | ---------- | ----------- | -| `v` | `byte[32]` | Node value. | +| name | type | description | +| ---- | ------------------------- | ----------- | +| `v` | [HashDigest](#hashdigest) | Node value. | The base case (an empty tree) is defined as the default value: ```C++ From d3d1688bcb1cf2e0554724218eea1c3f070d6bdc Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:48:22 -0400 Subject: [PATCH 19/21] Clean up. --- specs/data_structures.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index d7f0800..3a1ff91 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -241,12 +241,12 @@ The base case (an empty tree) is defined as zero: node.v = 0x0000000000000000000000000000000000000000000000000000000000000000 ``` -For leaf node `node` of leaf data `d`, its value `v` is: +For leaf node `node` of leaf data `d`: ```C++ node.v = h(0x00, serialize(d)) ``` -For internal node `node` with children `l` and `r`, its value `v` is: +For internal node `node` with children `l` and `r`: ```C++ node.v = h(0x01, l.v, r.v) ``` @@ -328,14 +328,14 @@ The base case (an empty tree) is defined as the default value: node.v = 0x0000000000000000000000000000000000000000000000000000000000000000 ``` -For leaf node `node` of leaf data `d` with key `k`, its value `v` is: +For leaf node `node` of leaf data `d` with key `k`: ```C++ node.v = h(0x00, k, serialize(d)) ``` The key of leaf nodes must be prepended, since the index of a leaf node that is not at the base of the tree cannot be determined without this information. -For internal node `node` with children `l` and `r`, its value `v` is: +For internal node `node` with children `l` and `r`: ```C++ node.v = h(0x01, l.v, r.v) ``` From a746d0db507403b62a9945c256856278a6a33aa3 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:50:09 -0400 Subject: [PATCH 20/21] Clean up how child nodes are serialized. --- specs/data_structures.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 3a1ff91..0d1d718 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -248,7 +248,7 @@ node.v = h(0x00, serialize(d)) For internal node `node` with children `l` and `r`: ```C++ -node.v = h(0x01, l.v, r.v) +node.v = h(0x01, serialize(l), serialize(r)) ``` Note that rather than duplicating the last node if there are an odd number of nodes (the [Bitcoin design](https://github.com/bitcoin/bitcoin/blob/5961b23898ee7c0af2626c46d5d70e80136578d3/src/consensus/merkle.cpp#L9-L43)), trees are allowed to be imbalanced. In other words, the height of each leaf may be different. For an example, see Section 2.1.3 of [Certificate Transparency (RFC-6962)](https://tools.ietf.org/html/rfc6962). @@ -295,7 +295,7 @@ For internal node `node` with children `l` and `r`: ```C++ node.n_min = min(l.n_min, r.n_min) node.n_max = max(l.n_max, r.n_max) -node.v = h(l, r) = h(0x01, l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v) +node.v = h(l, r) = h(0x01, serialize(l), serialize(r)) ``` ### Namespace Merkle Tree Proofs @@ -337,7 +337,7 @@ The key of leaf nodes must be prepended, since the index of a leaf node that is For internal node `node` with children `l` and `r`: ```C++ -node.v = h(0x01, l.v, r.v) +node.v = h(0x01, serialize(l), serialize(r)) ``` ### Sparse Merkle Tree Proofs From 37d911cb1ededb401d9d6a7cea7cf3a784d26bd3 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 2 Jul 2020 11:57:27 -0400 Subject: [PATCH 21/21] Fix Merkle proof for NMT. --- specs/data_structures.md | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 0d1d718..54e5baa 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -298,16 +298,22 @@ node.n_max = max(l.n_max, r.n_max) node.v = h(l, r) = h(0x01, serialize(l), serialize(r)) ``` +A root hash can be computed by taking the [hash](#hashing) of the [serialized](#serialization) root node. + ### Namespace Merkle Tree Proofs -| name | type | description | -| --------------- | -------------------------------- | ----------------------------- | -| `root` | [HashDigest](#hashdigest) | Merkle root. | -| `key` | `byte[32]` | Key (i.e. index) of the leaf. | -| `siblingValues` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | -| `siblingMins` | [NamespaceID](#type-aliases)`[]` | Sibling min namespace IDs. | -| `siblingMaxes` | [NamespaceID](#type-aliases)`[]` | Sibling max namespace IDs. | -| `leaf` | `byte[]` | Leaf value. | +| name | type | description | +| -------------------- | -------------------------------- | ----------------------------- | +| `rootHash` | [HashDigest](#hashdigest) | Root hash. | +| `rootNamespaceIDMin` | [NamespaceID](#type-aliases) | Root minimum namespace ID. | +| `rootNamespaceIDMax` | [NamespaceID](#type-aliases) | Root maximum namespace ID. | +| `key` | `byte[32]` | Key (i.e. index) of the leaf. | +| `siblingValues` | [HashDigest](#hashdigest)`[]` | Sibling hash values. | +| `siblingMins` | [NamespaceID](#type-aliases)`[]` | Sibling min namespace IDs. | +| `siblingMaxes` | [NamespaceID](#type-aliases)`[]` | Sibling max namespace IDs. | +| `leaf` | `byte[]` | Leaf value. | + +When verifying a NMT proof, the root hash is checked by reconstructing the root node `root_node` with the computed `root_node.v` (computed as with a [plain Merkle proof](#binary-merkle-tree-proofs)) and the provided `rootNamespaceIDMin` and `rootNamespaceIDMax` as the `root_node.n_min` and `root_node.n_max`, respectively. ## Sparse Merkle Tree