Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write and run test for testing consensus reactor for 134MB (512x512 ODS) blocks across 100 validators #225

Open
musalbas opened this issue Sep 4, 2023 · 2 comments

Comments

@musalbas
Copy link
Member

musalbas commented Sep 4, 2023

The purpose of this test is to (1) verify that the consensus reactor of CometBFT can deal with 134MB blocks across 100 validators, and (2) establish what is the minimum value for timeout_commit that can be used for such block sizes. We do not need to run any celestia-node nodes for the scope of this test. We should also disable mempool tx gossiping in this test and generate transactions locally, as the scope of this test is to test the consensus reactor only.

Setup

  • Create 100 validators using this branch that has a maximum ODS size of 512x512: https://github.com/musalbas/celestia-app/tree/musalbas/ods1024
  • Validators should have a realistic network latency setup
  • Set max_bytes in genesis.json to 1073741824 (1GB)
  • Set broadcast=false in config.toml
  • Set RecvRate and SendRate to 10000000 (10MB) in config.toml (we can try to adjust this later if that causes issues)
  • Set timeout_commit in config.toml to 3 - we should play with this to see what's the lowest we can get away with
  • Each validator should have 16 vCPUs, to ensure that constructing the erasure code isn't a bottleneck

Test

  • On each validator, run the txsim utility locally on a separate process, with the following options: --blob 500 --blob-sizes 1000000-1000000 --blob-amounts 1-1 --feegrant true. This will create 500 routines on each validator, that are sending PFBs with 1MB blobs. Validators will need an account with sufficient funds to do this.
  • Record (1) the block size of each block to verify that blocks are getting filled up; (2) the number of signatures of each block to verify that all validators are able to sign and commit each block; (3) network bandwidth statistics
@Bidon15
Copy link
Member

Bidon15 commented Sep 5, 2023

To achieve this we would need to keep 2 chapters in mind:

Infrastructure for Setup

In order to fulfil 16 vcpus per validator node, the node instance for the k8s cluster should be at least c5.4xlarge(16 vcpus) or c5.9xlarge where we have 32 vcpus per instance. I'd rather start with c5.4xlarge(we have them by default rn) and decrease to 14-15 vcpus at the start for the validator containers that each pod will serve. This will make sure that we scale 100 aws's node instances to a 1/1 ratio

According to recent test runs with full validators' set and QGB, the current infrastructure implementation should not be the bottleneck

Testing Environment

Code base

  1. As per @celestiaorg/celestia-core team members historical usage, it's much better to either branch out test-infra to remove celestia-node part of tests' code base or just fork out and make a canonical test-infra-consensus repo
  2. Most of the setup of validators is complete
  3. config.toml configuration should be straight forward to accommodate changes we need. Same can be said to genesis.json if we need to modify whatever

Network setup

Validators should have a realistic network latency setup

We already have the necessary requisites to start playing with 0/100/200/300+ ms latencies and do it per validator to make the network more 'realistic'. Unfortunately the latency is not dynamically changeable during test execution

Still, I would recommend to kickstart with no bandwidth and latency limitations and go to the monitoring of validators to see unrestricted figures on network per pod/validator in grafana dashboards

Txsim

Currently, we already have a docker image of txsim, that we can pull into the testable DockerFile that is built and run from testground's pov.

This means that we can just add another CLI call in a go test code and point the celestia address of each validator as the master account for the txsim to produce big blobs submissions

--blob 500 --blob-sizes 1000000-1000000 --blob-amounts 1-1 --feegrant true

@evan-forbes
Copy link
Member

ref celestiaorg/celestia-core#945 and celestiaorg/celestia-app#2033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants