Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validator: Add CLI args to control rocksdb threadpool sizes #4214

Merged
merged 6 commits into from
Dec 31, 2024

Conversation

steviez
Copy link

@steviez steviez commented Dec 27, 2024

Problem

See #105 for more context

Summary of Changes

  • Commit 1: Set the threadpool sizes directly
    • We previously used options.increase_parallelism() which would size the high priority (memtable flush) threadpool differently than what the comments would indicate in the code
  • Commit 2: Bubble threadpool size configuration up to validator CLI args

Threadpool sizes are unchanged from this PR, but will enable fine tuning in subsequent changes

Additional Context for setting threadpool size directly

The below section has some code snippets/links that explain why our current code configures num_cpus low priority threads and num_cpus / 4 high priority threads. Let N = num_cpus::get().

Snippets

We were previously calling increase_parallelism() with N which ends up here:

DBOptions* DBOptions::IncreaseParallelism(int total_threads) {
  max_background_jobs = total_threads;
  env->SetBackgroundThreads(total_threads, Env::LOW);
  env->SetBackgroundThreads(1, Env::HIGH);
  return this;
}

This immediately creates N low priority threads, and 1 high priority thread. But, note that max_background_jobs is also set to N. This is important because when the DB is opened, the options are checked for consistency and thread pools are grown if necessary. That happens here:

  auto bg_job_limits = DBImpl::GetBGJobLimits(
      result.max_background_flushes, result.max_background_compactions,
      result.max_background_jobs, true /* parallelize_compactions */);

with the implementation here:

DBImpl::BGJobLimits DBImpl::GetBGJobLimits(int max_background_flushes,
                                          int max_background_compactions,
                                          int max_background_jobs,
                                          bool parallelize_compactions) {
 BGJobLimits res;
 if (max_background_flushes == -1 && max_background_compactions == -1) {
   // for our first stab implementing max_background_jobs, simply allocate a
   // quarter of the threads to flushes.
   res.max_flushes = std::max(1, max_background_jobs / 4);
   res.max_compactions = std::max(1, max_background_jobs - res.max_flushes);
 } else {
   // compatibility code in case users haven't migrated to max_background_jobs,
   // which automatically computes flush/compaction limits
   res.max_flushes = std::max(1, max_background_flushes);
   res.max_compactions = std::max(1, max_background_compactions);
 }
 if (!parallelize_compactions) {
   // throttle background compactions until we deem necessary
   res.max_compactions = 1;
 }
 return res;
}

We hit the if case and then assuming N > 4 as it is for validators:

res.max_flushes = std::max(1, max_background_jobs / 4) 
                = std::max(1, N / 4) 
                = N / 4
res.max_compactions = std::max(1, max_background_jobs - res.max_flushes) 
                   = std::max(1, N - (N / 4))
                   = std::max(1, 3N / 4)
                   = 3N / 4

Then, we go grow these threadpools if necessary here:

  result.env->IncBackgroundThreadsIfNeeded(bg_job_limits.max_compactions,
                                           Env::Priority::LOW);
  result.env->IncBackgroundThreadsIfNeeded(bg_job_limits.max_flushes,
                                           Env::Priority::HIGH);

So, the value of 4 that we set here is coerced up on high core count nodes:

env.set_high_priority_background_threads(4);

Empirically, if I adjust the value in options.set_max_background_jobs() to 256, I see 192 (3N/4) low and 64 (N/4) high threads.

Testing out on a machine with 32 cores / 64 threads, I still see 64 low priority threads / 16 high priority threads with this PR. I also confirmed that setting --rocksdb-compaction-threads X and --rocksdb-flush-threads Y yields X low priority threads and Y high priority threads

We previously used options.increase_parallelism() which would size the
high priority (memtable flush) threadpool differently than what the
comments would indicate in the code.

So, set the threadpool sizes directly. Threadpool sizes are unchanged
from this PR, but will enable fine tuning in subsequent changes
@steviez steviez changed the title blockstore: Set rocksdb threadpool sizes directly validator: Add CLI args to control rocksdb threadpool sizes Dec 29, 2024
@alexpyattaev
Copy link

I would suggest we change defaults to use at 4-8 threads in total. The work here is IO bound, and in my tests I have not observed rocksdb ever using more than 3 threads. If someone needs more, they would be able to change via CLI.

Even under max load I believe 8 threads would be enough to saturate the ssd, unless rocksdb needs serious optimization.

Even if it somehow can use more threads, it could then interfere with other validator operations that are already saturating their respective CPU cores (PoH, snapshot packaging, gossipRx). So I suggest the default pool size should never be set above n_cpu/2 for any thread pool at all.

@steviez
Copy link
Author

steviez commented Dec 30, 2024

I would suggest we change defaults to use at 4-8 threads in total...If someone needs more, they would be able to change via CLI.

I agree that the current pool sizes are overkill. For this PR, the goal is to introduce the CLI args without any change in behavior. Basically, introduce the mechanical changes in one PR (this one) and then the actual changes in functionality (shrink the pools) in another. Breaking these up has a handful of benefits, including smaller diff's in general (easier to review) as well as isolating the functionality changes (which deserve more scrutiny) to a much smaller diff

alexpyattaev
alexpyattaev previously approved these changes Dec 30, 2024
Copy link

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one commented line should probably be deleted instead.

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor nits. 1 question around the default low prio pool thread count

ledger-tool/src/ledger_utils.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_db.rs Show resolved Hide resolved
ledger/src/blockstore_options.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_db.rs Show resolved Hide resolved
Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@steviez steviez merged commit 225f899 into anza-xyz:master Dec 31, 2024
40 checks passed
@steviez steviez deleted the bstore_tpool_sizes branch December 31, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants