Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: revises the namespace specifications and includes some clarifications #2124

Merged
merged 22 commits into from
Jul 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions specs/src/specs/data_structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,8 +213,7 @@ A proof for a leaf in a [binary Merkle tree](#binary-merkle-tree), as per Sectio

### Namespace Merkle Tree


<!-- disable markdown link check for bitcointalk.org because it frequently fails -->
<!-- disable markdown link check for bitcointalk.org because it frequently fails -->
<!-- markdown-link-check-disable -->
[Shares](./shares.md) in Celestia are associated with a provided _namespace_. The Namespace Merkle Tree (NMT) is a variation of the [Merkle Interval Tree](https://eprint.iacr.org/2018/642), which is itself an extension of the [Merkle Sum Tree](https://bitcointalk.org/index.php?topic=845978.0). It allows for compact proofs around the inclusion or exclusion of shares with particular namespace IDs.
<!-- markdown-link-check-enable -->
Expand Down
2 changes: 1 addition & 1 deletion specs/src/specs/fraud_proofs.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ State fraud proofs allow light clients to avoid making an honest majority assump
state validity. While these are not incorporated into the protocol as of v1.0.0,
there are example implementations that can be found in
[Rollkit](https://github.com/rollkit/rollkit). More info in
[rollkit-ADR009](https://github.com/rollkit/rollkit/blob/4fd97ba8b8352771f2e66454099785d06fd0c31b/docs/lazy-adr/adr-009-state-fraud-proofs.md).
[rollkit-ADR009](https://github.com/rollkit/rollkit/blob/4fd97ba8b8352771f2e66454099785d06fd0c31b/docs/lazy-adr/adr-009-state-fraud-proofs.md).
52 changes: 43 additions & 9 deletions specs/src/specs/namespace.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,40 @@

## Abstract

One of Celestia's core data structures is the namespace. When a user submits a `MsgPayForBlobs` transaction to Celestia they MUST associate each blob with exactly one namespace. After their transaction has been included in a block, the namespace enables users to take an interest in a subset of the blobs published to Celestia by allowing the user to query for blobs by namespace.
One of Celestia's core data structures is the namespace.
When a user submits a transaction encapsulating a `MsgPayForBlobs` message to Celestia, they MUST associate each blob with exactly one namespace.
After their transaction has been included in a block, the namespace enables users to take an interest in a subset of the blobs published to Celestia by allowing the user to query for blobs by namespace.

In order to enable efficient retrieval of blobs by namespace, Celestia makes use of a [Namespaced Merkle Tree](https://github.com/celestiaorg/nmt). See section 5.2 of the [LazyLedger whitepaper](https://arxiv.org/pdf/1905.09274.pdf) for more details.
In order to enable efficient retrieval of blobs by namespace, Celestia makes use of a [Namespaced Merkle Tree](https://github.com/celestiaorg/nmt).
See section 5.2 of the [LazyLedger whitepaper](https://arxiv.org/pdf/1905.09274.pdf) for more details.

## Overview

A namespace is composed of two fields: [version](#version) and [id](#id). A namespace is encoded as a byte slice with the version and id concatenated. Each [share](./shares.md) is prefixed with exactly one namespace.
A namespace is composed of two fields: [version](#version) and [id](#id).
A namespace is encoded as a byte slice with the version and id concatenated.

![namespace](./figures/namespace.svg)

### Version

The namespace version is an 8-bit unsigned integer that indicates the version of the namespace. The version is used to determine the format of the namespace. The only supported user-specifiable namespace version is `0`. The version is encoded as a single byte.
The namespace version is an 8-bit unsigned integer that indicates the version of the namespace.
The version is used to determine the format of the namespace and
is encoded as a single byte.
Comment on lines +24 to +25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] semantic line breaks are new for me so I don't understand why a line break was introduced in the middle of this sentence.

Was it added to conform to maximum line length character requirements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line break is added here because the second part, "is encoded as a single byte," conveys a separate and independent message from the first part. By moving it to the next line, we ensure clarity and emphasize the distinction between the two ideas.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clarifying! TBH it's not immediately obvious to me when to introduce a semantic line break but I think it's safe to keep one here :)

I guess we could also rewrite this as two sentences:

The version is used to determine the format of the namespace.
The version is encoded as a single byte.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH it's not immediately obvious to me when to introduce a semantic line break but I think it's safe to keep one here :)

As a general guideline, it is usually recommended to add a line break after each ending period. However, there are other instances where we may want to break a line in the middle. In such cases, a helpful mental model is to view it as commit messages. Similar to keeping each commit short and self-contained, we can break down different semantically independent messages of a line into separate lines, using line breaks.

I guess we could also rewrite this as two sentences:

Yes, Your suggested approach also works.

A new namespace version MUST be introduced if the namespace format changes in a backwards incompatible way.

Note: The `PARITY_SHARE_NAMESPACE` uses the namespace version `255` so that it can be ignored via the `IgnoreMaxNamespace` feature from [nmt](https://github.com/celestiaorg/nmt). The `TAIL_PADDING_NAMESPACE` uses the namespace version `255` so that it remains ordered after all blob namespaces even in the case a new namespace version is introduced.
Below we explain supported user-specifiable namespace versions,
however, we note that Celestia MAY utilize other namespace versions for internal use.
For more details, see the [Reserved Namespaces](#reserved-namespaces) section.

A namespace with version `0` must contain an id with a prefix of 18 leading `0` bytes. The remaining 10 bytes of the id are user-specified.
#### Version 0

The only supported user-specifiable namespace version is `0`.
A namespace with version `0` MUST contain an id with a prefix of 18 leading `0` bytes.
The remaining 10 bytes of the id are user-specified.
Below, we provide examples of valid and invalid encoded user-supplied namespaces with version `0`.

```go
// Valid encoded namespaces
0x0000000000000000000000000000000000000000000000000000000001 // transaction namespace
staheri14 marked this conversation as resolved.
Show resolved Hide resolved
0x0000000000000000000000000000000000000001010101010101010101 // valid blob namespace
0x0000000000000000000000000000000000000011111111111111111111 // valid blob namespace

Expand All @@ -34,14 +47,26 @@ A namespace with version `0` must contain an id with a prefix of 18 leading `0`
0x1111111111111111111111111111111111111111111111111111111111 // invalid because it does not have version 0
```

A new namespace version MUST be introduced if the namespace format changes in a backwards incompatible way (i.e. the number of leading `0` bytes in the id prefix is reduced).
Any change in the number of leading `0` bytes in the id of a namespace with version `0` is considered a backwards incompatible change and MUST be introduced as a new namespace version.

### ID

The namespace ID is a 28 byte identifier that uniquely identifies a namespace. The ID is encoded as a byte slice of length 28.
The namespace ID is a 28 byte identifier that uniquely identifies a namespace.
The ID is encoded as a byte slice of length 28.
<!-- It may be useful to indicate the endianness of the encoding) -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] is endianess applicable for a byte slice that doesn't inherently represent some value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't inherently represent some value?

Can you provide further clarification on this specific section?
The interpretation of the namespace impacts in various contexts, particularly in relation to ordering in NMTs. Consequently, the encoding and decoding of the namespace play a crucial role when transmitting data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK https://en.wikipedia.org/wiki/Endianness is important to clarify for types like a int64 but I don't understand how it applies to this []byte

Screenshot 2023-07-19 at 3 05 51 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the encoding/binary packages in the Go std to read or write multi-byte values from byte slices either in big endian or little endian format.

Exactly, as soon as we want to read and use that byte slice, then the endianness becomes important. Imagine a namespace query with a namespace encoded in little-endian order (in its Protobuf format), while the receiver expects namespaces in big-endian order, so the query won't be resolved correctly, as there won't be any match in the tree for that namespace due to the mismatch endianness.


## Reserved Namespaces

Celestia reserves certain namespaces with specific meanings.
Celestia makes use of the reserved namespaces to properly organize and order transactions and blobs inside the [data square](./data_square_layout.md).
Applications MUST NOT use these reserved namespaces for their blob data.

Below is a list of reserved namespaces, along with a brief description of each.
In the table, you will notice that the `PARITY_SHARE_NAMESPACE` and `TAIL_PADDING_NAMESPACE` utilize the namespace version `255`, which differs from the supported user-specified versions.
The reason for employing version `255` for the `PARITY_SHARE_NAMESPACE` is to enable more efficient proof generation within the context of [nmt](https://github.com/celestiaorg/nmt), where it is used in conjunction with the `IgnoreMaxNamespace` feature.
staheri14 marked this conversation as resolved.
Show resolved Hide resolved
Similarly, the `TAIL_PADDING_NAMESPACE` utilizes the namespace version `255` to ensure that padding shares are always properly ordered and placed at the end of the Celestia data square even if a new namespace version is introduced.
For additional information on the significance and application of the reserved namespaces, please refer to the [Data Square Layout](./data_square_layout.md) specifications.

| name | type | value | description |
|-------------------------------------|-------------|----------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| `TRANSACTION_NAMESPACE` | `Namespace` | `0x0000000000000000000000000000000000000000000000000000000001` | Transactions: requests that modify the state. |
Expand All @@ -54,11 +79,20 @@ The namespace ID is a 28 byte identifier that uniquely identifies a namespace. T

## Assumptions and Considerations

Applications MUST refrain from using the [reserved namespaces](#reserved-namespaces) for their blob data.

## Implementation

See [pkg/namespace](../../../pkg/namespace).

## Protobuf Definition

<!-- TODO: Add protobuf definition for namespace -->
Comment on lines +88 to +90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does namespace actually have protobuf definitions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either, it is a placeholder just in case we have, or we want to have a protobuf definition. Please see #2128.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't see the use. I think it's adequate to have it represented as an array of bytes


## References

1. [ADR-014](../../../docs/architecture/adr-014-versioned-namespaces.md)
1. [ADR-015](../../../docs/architecture/adr-015-namespace-id-size.md)
1. [Namespaced Merkle Tree](https://github.com/celestiaorg/nmt)
1. [LazyLedger whitepaper](https://arxiv.org/pdf/1905.09274.pdf)
1. [Data Square Layout](./data_square_layout.md)
6 changes: 3 additions & 3 deletions tools/blocktime/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ To read up on starting a node and exposing the RPC endpoint go to the docs [here

To compile the binary, run either `go install` or `go build`. The binary can then be used as follows:

```
$ ./blocktime <node_rpc> [query_range]
```bash
./blocktime <node_rpc> [query_range]
```

As an example

```
```bash
$ ./blocktime http://localhost:26657 1000

Chain: mocha-3
Expand Down