Skip to content
This repository has been archived by the owner on Mar 24, 2023. It is now read-only.

Merkle trees misc cleanup and fixes #41

Closed
liamsi opened this issue Jun 24, 2020 · 7 comments · Fixed by #47
Closed

Merkle trees misc cleanup and fixes #41

liamsi opened this issue Jun 24, 2020 · 7 comments · Fixed by #47
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request

Comments

@liamsi
Copy link
Member

liamsi commented Jun 24, 2020

https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#sparse-merkle-tree

For leaf node of leaf message m with key k, its value v is:

Better to use a different word than message here (otherwise this might be conflated with the other use of the word messages)

Also, it would help getting a quick understanding, if there was 2-3 simple to follow examples how an smt and its will look like (including trivial edge-case like all-empty tree, only one entry, and one realistic example with a few entries).

@liamsi liamsi added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 24, 2020
@adlerjohn adlerjohn self-assigned this Jun 24, 2020
@adlerjohn adlerjohn added this to the Pre-implementation draft milestone Jun 24, 2020
@adlerjohn
Copy link
Member

adlerjohn commented Jun 24, 2020

Things to add:

  • Split up computing the root and Merkle proofs into different subsections
  • Example diagrams (especially one that shows that internal nodes w/single child are possible)
  • Define zero tree base case
  • Fix wording on leaf messages
  • Merkle multiproof format
  • The annotated Merkle tree abstraction isn't suitable for prefixing to differentiate between internal and leaf nodes. Need to make that an explicit feature.
  • Maybe remove the annotated Merkle tree abstraction since it isn't used elsewhere in the spec except for NMT. Define the domain separation prefixing for base Merkle trees.
  • Change binary Merkle tree to use scheme in RFC 6962, i.e. unbalanced tree.

@adlerjohn adlerjohn added the bug Something isn't working label Jun 24, 2020
@adlerjohn adlerjohn changed the title SMT section followups Merkle trees misc cleanup and fixes Jun 24, 2020
@liamsi
Copy link
Member Author

liamsi commented Jun 25, 2020

The annotated Merkle tree abstraction isn't suitable for prefixing to differentiate between internal and leaf nodes. Need to make that an explicit feature.

Independent of the prefixing, I'm not sure the annotated and the namespaced merkle tree hashes the correct thing.

Looking at how @musalbas defined the hashing in his paper:

We define a wrapper function nsHash(x) that produces hashes prefixed with namespace identifiers. A namespaced hash has the format minNs||maxNs||hash(x), where minNs is the lowest namespace identifier found in all the children of the node that the hash represents, and maxNs is the highest.

OK, and:

If x is a leaf, then minNs = maxNs = ns(x), as the hash contains only one leaf with a single namespace

Using the notation in the spec, my understanding is that this would result in:

n_min = m_1_l(m) = m.namespace_id
n_max = m_2_l(m) = m.namespace_id
v = n_min || n_max || h(serialize(m))

but we have:

n_min = m_1_l(m) = m.namespace_id
n_max = m_2_l(m) = m.namespace_id
v = h(serialize(m))

https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#namespace-merkle-tree

Similarly, for the inner nodes, I think the hash of should be something like:
v = nsh(l, r) = n_min || n_max || h( l.v, r.v )
instead of currently:
v = h(l, r) = h(l.n_min, l.n_max, l.v, r.n_min, r.n_max, r.v)

I know the spec's version is just a logical abstraction but at least it isn't obvious how both are compatible.

@liamsi
Copy link
Member Author

liamsi commented Jun 25, 2020

Ah, while writing this up, I think I see how they can be seen as equivalent.

@adlerjohn
Copy link
Member

The two ways of computing node values aren't identical but should provide the same guarantees maybe? I need to double-check, but if not the annotated Merkle tree abstraction can be changed to reflect the way NMTs are described in the paper.

@liamsi
Copy link
Member Author

liamsi commented Jun 25, 2020

I think it's actually closer to the paper as it seem on 1st sight, with the difference that leaf nodes are "namespace-hashed" as n_min || n_max || h(m) in the paper and as h(serialize(m)) in the spec.

Something else I've noted: the spec makes the assumption that each leaf message m, somehow contains the namespace id via: m.namespace_id. Later, we speak of shares that are committed to the NMT, which do not explicitly have such a field: https://github.com/lazyledger/lazyledger-specs/blob/master/specs/data_structures.md#share

My understanding is that a share implicitly belongs to a namespace but reading through it again, the connection between share and namespace doesn't seem clear enough to me.

@liamsi
Copy link
Member Author

liamsi commented Jun 25, 2020

I would suggest to remove the "Annotated Merkle Tree" section as it is not really needed as part of the LL specification and directly introduce the concrete and simpler special case we need, namely the NMT. Unless we foresee any use for the generalisation (the annotated tree), which I currently don't see.

@musalbas
Copy link
Member

The min and max values of a node should be outside of the hash of the children, so that a client that wants to verify that they have received all the messages for their namespace should not need to download more data than they need to (specifically, the values of nodes that they aren't interested in).

@adlerjohn adlerjohn linked a pull request Jul 2, 2020 that will close this issue
7 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants