Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add block validity rules specs #1966

Merged
merged 24 commits into from
Jul 6, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
134b2b4
docs: init block validaity rules
evan-forbes Jun 7, 2023
0555466
docs: flesh out some of the rules
evan-forbes Jun 11, 2023
f3a5470
Merge branch 'main' into evan/block-validity-rules
evan-forbes Jun 20, 2023
51411fa
docs: add general summary of block validity rules
evan-forbes Jun 20, 2023
07c2678
chore: revert linter change
evan-forbes Jun 20, 2023
8173c9c
chore: clean up
evan-forbes Jun 20, 2023
b449608
fix: typo
evan-forbes Jun 20, 2023
d0329fd
fix: add feedback
evan-forbes Jun 23, 2023
069bbd7
fix: clarify multiple blobs per blobtx
evan-forbes Jun 23, 2023
8a5a7d1
chore: add link to data square layout
evan-forbes Jun 23, 2023
f8660a7
Merge branch 'main' into evan/block-validity-rules
evan-forbes Jun 23, 2023
b6ebd26
Merge branch 'main' into evan/block-validity-rules
evan-forbes Jun 23, 2023
4cac7a4
fix: typo
evan-forbes Jun 24, 2023
c3e19fe
refactor: move fraud proofs to their own section
evan-forbes Jun 26, 2023
57380e7
chore: add fraud proofs section to readme and summary
evan-forbes Jun 26, 2023
f72be81
fix: typo
evan-forbes Jun 26, 2023
550e4f5
Merge branch 'main' into evan/block-validity-rules
evan-forbes Jun 26, 2023
b36dcc4
Merge branch 'main' into evan/block-validity-rules
evan-forbes Jun 26, 2023
89c03ac
docs: clarifications
evan-forbes Jun 29, 2023
88a3ac7
fix: better wording around light clients
evan-forbes Jul 5, 2023
27209c8
docs: remove duplicate specs and just link to things in the validity …
evan-forbes Jul 5, 2023
9dca098
docs: simplyfy further
evan-forbes Jul 5, 2023
71cac1e
docs: minor refactor
evan-forbes Jul 5, 2023
7ac6ec3
chore: typo
evan-forbes Jul 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions specs/src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- [Consensus](./specs/consensus.md)
- [Block Proposer](./specs/block_proposer.md)
- [Block Validity Rules](./specs/block_validity_rules.md)
- [Fraud Proofs](./specs/fraud_proofs.md)
- [Networking](./specs/networking.md)
- [Public-Key Cryptography](./specs/public_key_cryptography.md)
- [Data Square Layout](./specs/data_square_layout.md)
Expand Down
1 change: 1 addition & 0 deletions specs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
- [Consensus](./specs/consensus.md)
- [Block Proposer](./specs/block_proposer.md)
- [Block Validity Rules](./specs/block_validity_rules.md)
- [Fraud Proofs](./specs/fraud_proofs.md)
- [Networking](./specs/networking.md)
- [Public-Key Cryptography](./specs/public_key_cryptography.md)
- [Data Square Layout](./specs/data_square_layout.md)
Expand Down
76 changes: 76 additions & 0 deletions specs/src/specs/block_validity_rules.md
Original file line number Diff line number Diff line change
@@ -1 +1,77 @@
# Block Validity Rules
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved

Unlike most blockchains, Celestia derives most of its functionality from
stateless commitments to data rather than stateful transitions. This means that
the protocol relies heavily on block validity rules. Notably, resource
constrained light clients must be able to detect when these validity rules have
not been followed in order to avoid making an honest majority assumption. This
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved
has a significant impact on thier design. More information on how light clients verify
block validity rules can be foud in the [Fraud Proofs](./fraud_proofs.md) spec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Do light nodes actually verify all the block validity rules? or just a subset? To me, it is more like the light client can only check the "Invalidity of the blocks" but not the validity. And they even do so with the aid of full nodes.


> **Note** Celestia relies on CometBFT (formerly tendermint) for consensus,
> meaning that it has single slot finality and is fork-free. Therefore, in order
> to ensure that an invalid block is never committed to, each validator must
> check that each block follows all validity rules before voting. If over two
> thirds of the voting power colludes to break a validity rule, then fraud
> proofs are created for light clients. After light clients verify fraud proofs,
Comment on lines +18 to +19
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question][no change needed]

  1. Do the "fraud proofs" referenced exist yet? I think no.
  2. Is there a more granular term used to describe these fraud proofs? If no such term exists, perhaps: "block validity fraud proofs"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BEFPs do, and blob inclusion will sooner or later. Not sure when we'll get state fraud proofs

this does bring up a good point, where I feel like all fraud proofs are actually block validity fraud proofs, and perhaps I should try to change the first paragraph to emphasize that

> they halt.

Before any Celestia specific validation is performed, all CometBFT [block
validation
rules](https://github.com/cometbft/cometbft/blob/v0.34.28/spec/core/data_structures.md#block)
must be followed. The only deviation from these rules is how the data root
([DataHash](https://github.com/cometbft/cometbft/blob/v0.34.28/spec/core/data_structures.md#header))
is generated. Almost all of Celestia's functionality is derived from this
change, including how it proves data availability to light clients.
Copy link
Contributor

@staheri14 staheri14 Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] I see there is a missing link between the block validity rules and the supplied subsections. My suggestion revolves around making this link a bit clear (feel free to rephrase it as you see fit, I got inspiration from the subsections you already included in the specs):
Below is my suggested addition to the text:

In Celestia, the block validity rules encompass two main aspects: the rules governing the validity of individual transactions and the rules dictating how to construct the data hash.

  • Transaction validity rules: Similar to any normal block, Celestia blocks are composed of a set of transactions. Therefore, the transaction validity rules defined by CometBFT (I guess part of the rules should be in the app not just CometBFT, so please revise it the way you think is correct) form an integral part of the block validity rules in Celestia. (a reference to the appropriate documentation for the specific transaction validity rules.)

  • Blob transaction: The Blob transaction is unique to Celestia, and its validity rules are covered in the corresponding [specifications](either link the specs or a subsection in this spec). ( I am assuming this is something we need to explain separately, that is why I have put it as a separate item)

  • Data hash calculation steps: The following steps are specific to the calculation of the data hash:

    • Encoding/laying out transactions into a data square: Transactions within the block are serialized and organized into a square format. This process involves reorganizing the transactions, converting them into a series of bytes, splitting them into fixed-size shares, arranging them in a square structure, and erasure coding them. All these steps adhere to specific rules, which are also considered part of the block validity rules. (Links to the respective specifications, OR subsections of the current doc, should be provided for further details.)

    • Constructing the data hash: The data hash is computed from the data square mentioned earlier. Detailed information on the construction of the data hash can be found in the provided specifications OR a subsection in this doc. The correct construction of the data hash is also within the block validity rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Celestia, the block validity rules encompass two main aspects: the rules governing the validity of individual transactions and the rules dictating how to construct the data hash.

I like this a lot for the celestia specific rules and have updated accordingly. Note that I have simplified further by deleting a lot of the original descriptions and instead relying on the other portions of the spec that already cover it


## Data Availability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Can you please elaborate on the reason behind including this section as part of the "Block validity rule" specs? It didn't seem to touch on any validation rule? Thanks 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm good point. I guess this was just a section for block validity rules that were directly related to data availability

this point is usually glossed over in most blockchains, as it is assumed that the data is downloaded entirely in order to verify it

this section describes the difference between consensus nodes and light clients to emphasize that point

The data for each block must be considered available before a given block can be
considered valid. For consensus nodes, this is done via an identical mechanism
to a normal CometBFT node, which involves downloading the entire block by each
node before considering that block valid.

Light clients however do not download the entire block.

and then uses this preface for the reasoning behind the rest of the rules

do you think we should remove this header? While I'm less confident about the header, I do think the three paragraphs after the header are a useful preface

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will attempt to address this when I address the above comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, and it does make sense. I would recommend consolidating all the relevant background into a dedicated section, such as "Background" or "Introduction".


The data for each block must be considered available before a given block can be
considered valid. For consensus nodes, this is done via an identical mechanism
to a normal CometBFT node, which involves downloading the entire block by each
node before considering that block valid.

Light clients however do not download the entire block. They only sample a
fraction of the block. More details on how sampling actually works can be found
in the seminal ["Fraud and Data Availability Proofs: Maximising Light Client
Security and Scaling Blockchains with Dishonest
Majorities"](https://arxiv.org/abs/1809.09044) and in the
[`celestia-node`](https://github.com/celestiaorg/celestia-node) repo.

Per the [LazyLedger white paper](https://arxiv.org/pdf/1905.09274.pdf), Celestia
uses a 2D Reed-Solomon coding scheme
([rsmt2d](https://github.com/celestiaorg/rsmt2d)) to accommodate data
availability sampling. This involves "splitting" the CometBFT block data into
shares. Along with the 2D scheme, Celestia also makes use of [namespaced merkle
trees (nmt)](https://github.com/celestiaorg/nmt). These are combined to create
the commitment over block data instead of the typical merkle tree used by
CometBFT.

<img src="./figures/data_root.svg" alt="Figure 1: Data Root" width="400"/> <img
src="./figures/rs2d_quadrants.svg" alt="Figure 2: rsmt2d" width="400"/>

### Square Construction

The construction of the square is critical in providing additional guarantees to
light clients. Since the data root is a commitment to the square, the
construction of that square is also vital to correctly computing it.

TODO
[data square layout](./data_square_layout.md)

#### Share Encoding

Each chunk of block data is split into equally size shares for sampling
purposes. The encoding was designed to allow for light clients to decode these
shares to retrieve relevant data and to be future-proof yet backwards
compatible. The share encoding is deeply integrated into square construction, and
therefore critical to calculate the data root.

See [shares spec](./shares.md)

## `BlobTx` Validity Rules

Each `BlobTx` consists of a transaction to pay for one or more blobs, and the
blobs themselves. Each `BlobTx` that is included in the block must be valid.
Those rules are described in [`x/blob` module
specs](../../../x/blob/README.md#validity-rules)
25 changes: 25 additions & 0 deletions specs/src/specs/fraud_proofs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Fraud Proofs

## Bad Encoding Fraud Proofs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] I think the story of light clients and how they consider a block valid actually deserves separate specs, and I guess it is not even part of the core/app implementation, right? which means it can live in the node repo. I think here it's better to focus on the entities that actually exist in the core/app layer e.g., full nodes, validators, consensus nodes, etc. While the content of this part adds additional great info I think is not directly relevant. Due to this, and in favor of brevity and conciseness, I'd suggest leaving this subsection outside of the current specs. Though, I'll leave it up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point that fraud proofs are not directly part of core/app atm, so including them in the spec might not make a lot of sense. I do think we should at least have a link to the relevant specs, since that will preserve our capability to send a newcomer a single link where they can find all the important portions.

The initial reason for including something here is that I feel that they have a large impact on the design and implementation. For example, the sole reason we have square layout/blob commitment rules are to accomodate blob inclusion proofs. If we eventually have at least a high level description of it, then we can see the end result, which I think makes the rest of the concepts a lot clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we should at least have a link to the relevant specs, since that will preserve our capability to send a newcomer a single link where they can find all the important portions.

Good idea and agree!

The initial reason for including something here is that I feel that they have a large impact on the design and implementation. For example, the sole reason we have square layout/blob commitment rules are to accomodate blob inclusion proofs. If we eventually have at least a high level description of it, then we can see the end result, which I think makes the rest of the concepts a lot clearer.

Agree that such information, at a high level, will provide readers with a clear understanding of the motivations behind the design choices and the purpose of the validity rules.
Following my previous comment, I recommend incorporating all the motivational information into a single section. Additionally, it would be beneficial to explicitly mention that the design of the Celestia block is influenced by these reasons, resulting in an extensive list of validity rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that wherever validity rules are described should be were fraud proofs which define the failure to comply with the aforementioned validity rules should reside. Therefore I think fraud proofs should be described in the specification here.


In order for data availability sampling to work, light clients must be convinced
that erasure encoded parity data was encoded correctly. For light clients, this
is ultimately enforced via [bad encoding fraud proofs
(BEFPs)](https://github.com/celestiaorg/celestia-node/blob/v0.11.0-rc3/docs/adr/adr-006-fraud-service.md#detailed-design).
Consensus nodes must verify this themselves before considering a block valid.
This is done automatically by verifying the data root of the header, since that
requires reconstructing the square from the block data, performing the erasure
encoding, calculating the data root using that representation, and then
comparing the data root found in the header.

## Blob Inclusion

TODO

## State

State fraud proofs allow light clients to avoid making an honest majority for
state validity. While these are not incorporated into the protocol as of v1.0.0,
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved
there are example implementations that can be found in
[Rollkit](https://github.com/rollkit/rollkit). More info in
[rollkit-ADR009](https://github.com/rollkit/rollkit/blob/4fd97ba8b8352771f2e66454099785d06fd0c31b/docs/lazy-adr/adr-009-state-fraud-proofs.md).