Skip to content

Commit

Permalink
Add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mooori committed Dec 11, 2024
1 parent 86c6157 commit 669d16f
Showing 1 changed file with 104 additions and 0 deletions.
104 changes: 104 additions & 0 deletions docs/practices/workflows/benchmarking_synthetic_workloads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Benchmarking synthetic workloads

Benchmarking a synthetic workloads starts a new network with empty state. Then state is created and afterwards transactions involving that state are generated. For example, the native token transfer workload creates `n` accounts with NEAR balance and then generates transactions to transfer the native token between accounts.

This approach has the following benefits:

- Relatively simple and quick setup, as there is no state from real work networks involved.
- Fine grained control over traffic intensity.
- Enabling the comparison of `neard` performance at different points in time or with different features.
- Might expose performance bottlenecks.

The main drawbacks of synthetic benchmarks are:

- Drawing conclusions is limited as real world traffic is not homogenous.
- Calibrating traffic generation parameters can be cumbersome.

The tooling for synthetic benchmarks is available in [`benchmarks/bm-synth`](~/benchmarks/bm-synth).

## Common parameters

The following parameters are common to multiple tasks:

### `rpc-url`

The RPC endpoint to which transactions are sent.

Synthetic benchmarking may create thousands of transactions per second, which may hit network limitations if the RPC is located on a separate machine. In particular sending transactions to nodes running on GCP requires care as it can cause temporary IP address bans. For that scenario it is recommended to run a separate traffic generation vm located in the same GCP zone as the RPC node and send transactions to its `internal IP`.

### `interval-duration-micros`

Controls the rate at which transactions are sent. Assuming your hardware is able to send a request at every interval tick, the number of transactions sent per second equals `1_000_000 / interval-duration-micros`. The rate might be slowed down if `channel-buffer-size` becomes a bottleneck.

### `channel-buffer-size`

Before an RPC request is sent, the tooling awaits capacity to send into a buffered channel. Thereby the number of outstanding RPC requests is limited by `channel-buffer-size`. This can slow down the rate at which transactions are sent in case the node is congested. To disable that behavior, set `channel-buffer-size` to a large value, e.g. the total number of transactions to be sent.

## Workflows

The tooling's [`justfile`](~/benchmarks/synth-bm/justfile) contains recipes for the most relevant workflows.

### Create sub accounts

Creating the state for synthetic benchmarks usually starts with creating accounts. We create sub accounts for the account specified by `--signer-key-path`. This avoids dealing with the registrar, which would be required for creating top level accounts. To view all options, run:

```command
cargo run --release -- create-sub-accounts --help
```

### Benchmark native token transfers

Generates a native token transfer workload involving the accounts provided in `--user-dada-dir`. Transaction are generated by iterating through these accounts and sending native token to a randomly chosen receiver from the same set of accounts. To view all options, run:

```command
cargo run --release -- benchmark-native-transfers --help
```

Automatic calculation of transactions per second (TPS) when RPC requests are sent with `wait_until: NONE` is coming up shortly. In the meantime, they can be calculated manually by querying the `near_transaction_processed_successfully_total` metric, e.g. with:

```command
http localhost:3030/metrics | grep transaction_processed
```

## Network setup and `neard` configuration

Details of bringing up and configuring a network are out of scope for this document. Instead we just give a brief overview of the setup regularly used to benchmark TPS of common workloads in a single-node with a single-shard setup.

### Build `neard`

Chose the git commit and cargo features corresponding to what you want to benchmark. Most likely you will want a `--release` build to measure TPS.

### Initialize the network

```command
./neard --home .near init --chain-id localnet
```

### Enable memtrie

The configuration generated by above command does not enable memtrie. However, most benchmarks should run against a node with memtrie enabled, which can be achieved by setting the following in `.near/config.json`:

```
"load_mem_tries_for_tracked_shards": true
```

### Un-limit configuration

Following these steps so far creates a config that will throttle throughput due to various factors related to state witness size, gas/compute limits, and congestion control. In case you want to benchmark a node that fully utilizes its hardware, you can do the following modifications to effectively run with unlimited configuration:

```
# Modifications in .near/genesis.json
"chain_id": "benchmarknet"
"gas_limit": 20000000000000000 # increase default by x20
# Modifications in .near/config.json
"view_client_threads": 8 # increase default by x2
"load_mem_tries_for_tracked_shards": true
"produce_chunk_add_transactions_time_limit": {
"secs": 0,
"nanos": 800000000 # increase default by x4
}
```

Note that as `nearcore` evolves, these steps and `BENCHMARKNET` adjustments might need to be update to achieve the effect of unlimiting configuration.

0 comments on commit 669d16f

Please sign in to comment.