validator: Add CLI args to control rocksdb threadpool sizes #4214

steviez · 2024-12-27T07:43:02Z

Problem

See #105 for more context

Summary of Changes

Commit 1: Set the threadpool sizes directly
- We previously used options.increase_parallelism() which would size the high priority (memtable flush) threadpool differently than what the comments would indicate in the code
Commit 2: Bubble threadpool size configuration up to validator CLI args

Threadpool sizes are unchanged from this PR, but will enable fine tuning in subsequent changes

Additional Context for setting threadpool size directly

The below section has some code snippets/links that explain why our current code configures num_cpus low priority threads and num_cpus / 4 high priority threads. Let N = num_cpus::get().

Snippets

We were previously calling increase_parallelism() with N which ends up here:

DBOptions* DBOptions::IncreaseParallelism(int total_threads) {
  max_background_jobs = total_threads;
  env->SetBackgroundThreads(total_threads, Env::LOW);
  env->SetBackgroundThreads(1, Env::HIGH);
  return this;
}

This immediately creates N low priority threads, and 1 high priority thread. But, note that max_background_jobs is also set to N. This is important because when the DB is opened, the options are checked for consistency and thread pools are grown if necessary. That happens here:

  auto bg_job_limits = DBImpl::GetBGJobLimits(
      result.max_background_flushes, result.max_background_compactions,
      result.max_background_jobs, true /* parallelize_compactions */);

with the implementation here:

DBImpl::BGJobLimits DBImpl::GetBGJobLimits(int max_background_flushes,
                                          int max_background_compactions,
                                          int max_background_jobs,
                                          bool parallelize_compactions) {
 BGJobLimits res;
 if (max_background_flushes == -1 && max_background_compactions == -1) {
   // for our first stab implementing max_background_jobs, simply allocate a
   // quarter of the threads to flushes.
   res.max_flushes = std::max(1, max_background_jobs / 4);
   res.max_compactions = std::max(1, max_background_jobs - res.max_flushes);
 } else {
   // compatibility code in case users haven't migrated to max_background_jobs,
   // which automatically computes flush/compaction limits
   res.max_flushes = std::max(1, max_background_flushes);
   res.max_compactions = std::max(1, max_background_compactions);
 }
 if (!parallelize_compactions) {
   // throttle background compactions until we deem necessary
   res.max_compactions = 1;
 }
 return res;
}

We hit the if case and then assuming N > 4 as it is for validators:

res.max_flushes = std::max(1, max_background_jobs / 4) 
                = std::max(1, N / 4) 
                = N / 4
res.max_compactions = std::max(1, max_background_jobs - res.max_flushes) 
                   = std::max(1, N - (N / 4))
                   = std::max(1, 3N / 4)
                   = 3N / 4

Then, we go grow these threadpools if necessary here:

  result.env->IncBackgroundThreadsIfNeeded(bg_job_limits.max_compactions,
                                           Env::Priority::LOW);
  result.env->IncBackgroundThreadsIfNeeded(bg_job_limits.max_flushes,
                                           Env::Priority::HIGH);

So, the value of 4 that we set here is coerced up on high core count nodes:

agave/ledger/src/blockstore_db.rs

Line 2009 in c7131f6

env.set_high_priority_background_threads(4);

Empirically, if I adjust the value in options.set_max_background_jobs() to 256, I see 192 (3N/4) low and 64 (N/4) high threads.

Testing out on a machine with 32 cores / 64 threads, I still see 64 low priority threads / 16 high priority threads with this PR. I also confirmed that setting --rocksdb-compaction-threads X and --rocksdb-flush-threads Y yields X low priority threads and Y high priority threads

We previously used options.increase_parallelism() which would size the high priority (memtable flush) threadpool differently than what the comments would indicate in the code. So, set the threadpool sizes directly. Threadpool sizes are unchanged from this PR, but will enable fine tuning in subsequent changes

alexpyattaev · 2024-12-30T07:16:14Z

I would suggest we change defaults to use at 4-8 threads in total. The work here is IO bound, and in my tests I have not observed rocksdb ever using more than 3 threads. If someone needs more, they would be able to change via CLI.

Even under max load I believe 8 threads would be enough to saturate the ssd, unless rocksdb needs serious optimization.

Even if it somehow can use more threads, it could then interfere with other validator operations that are already saturating their respective CPU cores (PoH, snapshot packaging, gossipRx). So I suggest the default pool size should never be set above n_cpu/2 for any thread pool at all.

steviez · 2024-12-30T08:17:36Z

I would suggest we change defaults to use at 4-8 threads in total...If someone needs more, they would be able to change via CLI.

I agree that the current pool sizes are overkill. For this PR, the goal is to introduce the CLI args without any change in behavior. Basically, introduce the mechanical changes in one PR (this one) and then the actual changes in functionality (shrink the pools) in another. Breaking these up has a handful of benefits, including smaller diff's in general (easier to review) as well as isolating the functionality changes (which deserve more scrutiny) to a much smaller diff

ledger/src/blockstore_db.rs

alexpyattaev

LGTM, one commented line should probably be deleted instead.

bw-solana

Few minor nits. 1 question around the default low prio pool thread count

ledger-tool/src/ledger_utils.rs

ledger/src/blockstore_db.rs

ledger/src/blockstore_options.rs

ledger/src/blockstore_db.rs

bw-solana

LGTM

steviez changed the title ~~blockstore: Set rocksdb threadpool sizes directly~~ validator: Add CLI args to control rocksdb threadpool sizes Dec 29, 2024

steviez force-pushed the bstore_tpool_sizes branch from 36f0b91 to 6598e6c Compare December 29, 2024 04:04

validator: Add CLI args to control rocksdb threadpool sizes

62a73f8

steviez force-pushed the bstore_tpool_sizes branch from 6598e6c to 62a73f8 Compare December 29, 2024 04:42

steviez mentioned this pull request Dec 30, 2024

Eliminate some obviously useless threading #4219

Open

steviez requested review from cpubot, alexpyattaev and bw-solana December 30, 2024 02:01

alexpyattaev reviewed Dec 30, 2024

View reviewed changes

ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved

alexpyattaev previously approved these changes Dec 30, 2024

View reviewed changes

review feedback: remove line that I commented out

aed4738

steviez dismissed alexpyattaev’s stale review via aed4738 December 30, 2024 17:31

bw-solana reviewed Dec 30, 2024

View reviewed changes

ledger-tool/src/ledger_utils.rs Outdated Show resolved Hide resolved

ledger/src/blockstore_db.rs Show resolved Hide resolved

ledger/src/blockstore_options.rs Outdated Show resolved Hide resolved

ledger/src/blockstore_db.rs Show resolved Hide resolved

steviez added 3 commits December 30, 2024 22:42

review feedback: Use ..BlockstoreOptions::default() more

cbbbdee

review feedback: Adjust comment around set_max_background_jobs()

09bf071

clippy

969166f

bw-solana approved these changes Dec 31, 2024

View reviewed changes

steviez merged commit 225f899 into anza-xyz:master Dec 31, 2024
40 checks passed

steviez deleted the bstore_tpool_sizes branch December 31, 2024 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validator: Add CLI args to control rocksdb threadpool sizes #4214

validator: Add CLI args to control rocksdb threadpool sizes #4214

steviez commented Dec 27, 2024 •

edited

Loading

alexpyattaev commented Dec 30, 2024

steviez commented Dec 30, 2024

alexpyattaev left a comment •

edited

Loading

bw-solana left a comment

bw-solana left a comment

validator: Add CLI args to control rocksdb threadpool sizes #4214

validator: Add CLI args to control rocksdb threadpool sizes #4214

Conversation

steviez commented Dec 27, 2024 • edited Loading

Problem

Summary of Changes

Additional Context for setting threadpool size directly

alexpyattaev commented Dec 30, 2024

steviez commented Dec 30, 2024

alexpyattaev left a comment • edited Loading

Choose a reason for hiding this comment

bw-solana left a comment

Choose a reason for hiding this comment

bw-solana left a comment

Choose a reason for hiding this comment

steviez commented Dec 27, 2024 •

edited

Loading

alexpyattaev left a comment •

edited

Loading