-
Notifications
You must be signed in to change notification settings - Fork 684
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
104 additions
and
0 deletions.
There are no files selected for viewing
104 changes: 104 additions & 0 deletions
104
docs/practices/workflows/benchmarking_synthetic_workloads.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Benchmarking synthetic workloads | ||
|
||
Benchmarking a synthetic workloads starts a new network with empty state. Then state is created and afterwards transactions involving that state are generated. For example, the native token transfer workload creates `n` accounts with NEAR balance and then generates transactions to transfer the native token between accounts. | ||
|
||
This approach has the following benefits: | ||
|
||
- Relatively simple and quick setup, as there is no state from real work networks involved. | ||
- Fine grained control over traffic intensity. | ||
- Enabling the comparison of `neard` performance at different points in time or with different features. | ||
- Might expose performance bottlenecks. | ||
|
||
The main drawbacks of synthetic benchmarks are: | ||
|
||
- Drawing conclusions is limited as real world traffic is not homogenous. | ||
- Calibrating traffic generation parameters can be cumbersome. | ||
|
||
The tooling for synthetic benchmarks is available in [`benchmarks/bm-synth`](~/benchmarks/bm-synth). | ||
|
||
## Common parameters | ||
|
||
The following parameters are common to multiple tasks: | ||
|
||
### `rpc-url` | ||
|
||
The RPC endpoint to which transactions are sent. | ||
|
||
Synthetic benchmarking may create thousands of transactions per second, which may hit network limitations if the RPC is located on a separate machine. In particular sending transactions to nodes running on GCP requires care as it can cause temporary IP address bans. For that scenario it is recommended to run a separate traffic generation vm located in the same GCP zone as the RPC node and send transactions to its `internal IP`. | ||
|
||
### `interval-duration-micros` | ||
|
||
Controls the rate at which transactions are sent. Assuming your hardware is able to send a request at every interval tick, the number of transactions sent per second equals `1_000_000 / interval-duration-micros`. The rate might be slowed down if `channel-buffer-size` becomes a bottleneck. | ||
|
||
### `channel-buffer-size` | ||
|
||
Before an RPC request is sent, the tooling awaits capacity to send into a buffered channel. Thereby the number of outstanding RPC requests is limited by `channel-buffer-size`. This can slow down the rate at which transactions are sent in case the node is congested. To disable that behavior, set `channel-buffer-size` to a large value, e.g. the total number of transactions to be sent. | ||
|
||
## Workflows | ||
|
||
The tooling's [`justfile`](~/benchmarks/synth-bm/justfile) contains recipes for the most relevant workflows. | ||
|
||
### Create sub accounts | ||
|
||
Creating the state for synthetic benchmarks usually starts with creating accounts. We create sub accounts for the account specified by `--signer-key-path`. This avoids dealing with the registrar, which would be required for creating top level accounts. To view all options, run: | ||
|
||
```command | ||
cargo run --release -- create-sub-accounts --help | ||
``` | ||
|
||
### Benchmark native token transfers | ||
|
||
Generates a native token transfer workload involving the accounts provided in `--user-dada-dir`. Transaction are generated by iterating through these accounts and sending native token to a randomly chosen receiver from the same set of accounts. To view all options, run: | ||
|
||
```command | ||
cargo run --release -- benchmark-native-transfers --help | ||
``` | ||
|
||
Automatic calculation of transactions per second (TPS) when RPC requests are sent with `wait_until: NONE` is coming up shortly. In the meantime, they can be calculated manually by querying the `near_transaction_processed_successfully_total` metric, e.g. with: | ||
|
||
```command | ||
http localhost:3030/metrics | grep transaction_processed | ||
``` | ||
|
||
## Network setup and `neard` configuration | ||
|
||
Details of bringing up and configuring a network are out of scope for this document. Instead we just give a brief overview of the setup regularly used to benchmark TPS of common workloads in a single-node with a single-shard setup. | ||
|
||
### Build `neard` | ||
|
||
Chose the git commit and cargo features corresponding to what you want to benchmark. Most likely you will want a `--release` build to measure TPS. | ||
|
||
### Initialize the network | ||
|
||
```command | ||
./neard --home .near init --chain-id localnet | ||
``` | ||
|
||
### Enable memtrie | ||
|
||
The configuration generated by above command does not enable memtrie. However, most benchmarks should run against a node with memtrie enabled, which can be achieved by setting the following in `.near/config.json`: | ||
|
||
``` | ||
"load_mem_tries_for_tracked_shards": true | ||
``` | ||
|
||
### Un-limit configuration | ||
|
||
Following these steps so far creates a config that will throttle throughput due to various factors related to state witness size, gas/compute limits, and congestion control. In case you want to benchmark a node that fully utilizes its hardware, you can do the following modifications to effectively run with unlimited configuration: | ||
|
||
``` | ||
# Modifications in .near/genesis.json | ||
"chain_id": "benchmarknet" | ||
"gas_limit": 20000000000000000 # increase default by x20 | ||
# Modifications in .near/config.json | ||
"view_client_threads": 8 # increase default by x2 | ||
"load_mem_tries_for_tracked_shards": true | ||
"produce_chunk_add_transactions_time_limit": { | ||
"secs": 0, | ||
"nanos": 800000000 # increase default by x4 | ||
} | ||
``` | ||
|
||
Note that as `nearcore` evolves, these steps and `BENCHMARKNET` adjustments might need to be update to achieve the effect of unlimiting configuration. |