Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Manager snapshot preparer helper #9837

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 108 additions & 54 deletions defaults/manager_restore_benchmark_snapshots.yaml
Original file line number Diff line number Diff line change
@@ -1,110 +1,164 @@
bucket: "manager-backup-tests-permanent-snapshots-us-east-1"
cs_read_cmd_template: "cassandra-stress read cl=ONE n={num_of_rows} -schema 'keyspace={keyspace_name} replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=500 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq={sequence_start}..{sequence_end}"
cs_read_cmd_template: "cassandra-stress read cl={cl} n={num_of_rows} -schema 'keyspace={keyspace_name} replication(strategy={replication},replication_factor={rf}) compaction(strategy={compaction})' -mode cql3 native -rate threads=500 -col 'size=FIXED({col_size}) n=FIXED({col_n})' -pop seq={sequence_start}..{sequence_end}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why threads=500?
The number of threads must be dependent on the loader CPU cores.
Assuming common loader has 8 CPU cores and using 500 threads I expect it to be not efficient enough due to context switches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What approximations/rules can be used to define the optimal number of threads depending on loader CPU cores?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In common case (not related to C-S) it should be like we do it for scylla -> n_cpus - 1 per each 8 CPU cores.
Having 1-to-1 relationship between CPU core and thread allows to avoid context switches (heavy ops).
Having 1-to-many relationship between CPU core and thread guarantees CPU context switches.
It is pretty possible that some app may perform faster using single thread than 2 multi-threads.

So, it is matter of implementation and measurement to answer your question.

Probably, @roydahan or @soyacz may advice something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roydahan @soyacz do you have any advice?

sizes: # size of backed up dataset in GB
1gb_1t_ics:
tag: "sm_20240812100424UTC"
schema:
keyspace1:
- standard1: 1
number_of_rows: 1073760
exp_timeout: 1200 # 20 minutes (timeout for restore data operation)
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "IncrementalCompactionStrategy"
prohibit_verification_read: false
dataset:
schema:
keyspace1:
- standard1: 1
num_of_rows: 1073760
compaction: "IncrementalCompactionStrategy"
cl: "ONE"
col_size: 1024
col_n: 1
replication: "NetworkTopologyStrategy"
rf: 3
500gb_1t_ics:
tag: "sm_20240813112034UTC"
schema:
keyspace1:
- standard1: 500
number_of_rows: 524288000
exp_timeout: 14400 # 4 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "IncrementalCompactionStrategy"
prohibit_verification_read: false
dataset:
schema:
keyspace1:
- standard1: 500
num_of_rows: 524288000
compaction: "IncrementalCompactionStrategy"
cl: "ONE"
col_size: 1024
col_n: 1
replication: "NetworkTopologyStrategy"
rf: 3
500gb_1t_ics_tablets:
tag: "sm_20240813114617UTC"
schema:
keyspace1:
- standard1: 500
number_of_rows: 524288000
exp_timeout: 14400 # 4 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "IncrementalCompactionStrategy"
prohibit_verification_read: false
dataset:
schema:
keyspace1:
- standard1: 500
num_of_rows: 524288000
compaction: "IncrementalCompactionStrategy"
cl: "ONE"
col_size: 1024
col_n: 1
replication: "NetworkTopologyStrategy"
rf: 3
500gb_2t_ics:
tag: "sm_20240819203428UTC"
schema:
keyspace1:
- standard1: 250
keyspace2:
- standard1: 250
number_of_rows: 524288000
exp_timeout: 14400 # 4 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "IncrementalCompactionStrategy"
prohibit_verification_read: true
dataset:
schema:
keyspace1:
- standard1: 250
keyspace2:
- standard1: 250
num_of_rows: 524288000
compaction: "IncrementalCompactionStrategy"
cl: "ONE"
col_size: 1024
col_n: 1
replication: "NetworkTopologyStrategy"
rf: 3
1tb_1t_ics:
tag: "sm_20240814180009UTC"
schema:
keyspace1:
- standard1: 1024
number_of_rows: 1073741824
exp_timeout: 28800 # 8 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "IncrementalCompactionStrategy"
prohibit_verification_read: false
dataset:
schema:
keyspace1:
- standard1: 1024
num_of_rows: 1073741824
compaction: "IncrementalCompactionStrategy"
cl: "ONE"
col_size: 1024
col_n: 1
replication: "NetworkTopologyStrategy"
rf: 3
1tb_4t_twcs:
tag: "sm_20240821145503UTC"
schema:
keyspace1:
- t_10gb: 10
- t_90gb: 90
- t_300gb: 300
- t_600gb: 600
number_of_rows: 428571429
exp_timeout: 28800 # 8 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "TimeWindowCompactionStrategy"
prohibit_verification_read: true
dataset:
schema:
keyspace1:
- t_10gb: 10
- t_90gb: 90
- t_300gb: 300
- t_600gb: 600
num_of_rows: 428571429
compaction: "TimeWindowCompactionStrategy"
cl:
col_size:
col_n:
replication: "NetworkTopologyStrategy"
rf: 3
1tb_2t_twcs:
tag: "sm_20240827191125UTC"
schema:
keyspace1:
- t_300gb: 300
- t_700gb: 700
number_of_rows: 428571429
exp_timeout: 28800 # 8 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 9
compaction_strategy: "TimeWindowCompactionStrategy"
prohibit_verification_read: true
dataset:
schema:
keyspace1:
- t_300gb: 300
- t_700gb: 700
num_of_rows: 428571429
compaction: "TimeWindowCompactionStrategy"
cl:
col_size:
col_n:
replication: "NetworkTopologyStrategy"
rf: 3
1.5tb_2t_ics:
tag: "sm_20240820180152UTC"
schema:
keyspace1:
- standard1: 500
keyspace2:
- standard1: 1024
number_of_rows: 1598029824
exp_timeout: 43200 # 12 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "IncrementalCompactionStrategy"
prohibit_verification_read: true
dataset:
schema:
keyspace1:
- standard1: 500
keyspace2:
- standard1: 1024
num_of_rows: 1598029824
compaction: "IncrementalCompactionStrategy"
cl: "ONE"
col_size: 1024
col_n: 1
replication: "NetworkTopologyStrategy"
rf: 3
2tb_1t_ics:
tag: "sm_20240816185129UTC"
schema:
keyspace1:
- standard1: 2048
number_of_rows: 2147483648
exp_timeout: 57600 # 16 hours
scylla_version: "2024.2.0-rc1"
number_of_nodes: 3
compaction_strategy: "IncrementalCompactionStrategy"
prohibit_verification_read: false
dataset:
schema:
keyspace1:
- standard1: 2048
num_of_rows: 2147483648
compaction: "IncrementalCompactionStrategy"
cl: "ONE"
col_size: 1024
col_n: 1
replication: "NetworkTopologyStrategy"
rf: 3
23 changes: 23 additions & 0 deletions defaults/manager_snapshots_preparer_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Description: This file contains C-S command template and the default values of parameters used in this command.
# Parameter values (all expect cs_cmd_template) can be overwritten, otherwise, default values defined here will be used.
# Such a configuration is used by Manager backup snapshots prepare test.

cs_cmd_template: "cassandra-stress {operation} cl={cl} n={num_of_rows} -schema 'keyspace={ks_name} replication(strategy={replication},replication_factor={rf}) compaction(strategy={compaction})' -mode cql3 native -rate threads={threads_num} -col 'size=FIXED({col_size}) n=FIXED({col_n})' -pop seq={sequence_start}..{sequence_end}"

operation: "write"
cl: "QUORUM"

replication: "NetworkTopologyStrategy"
rf: 3
compaction: "IncrementalCompactionStrategy"

threads_num: 500

col_size: 1024
col_n: 1

# Defined in a runtime based on backup size, number of loaders, scylla version, etc
ks_name: ''
num_of_rows: ''
sequence_start: ''
sequence_end: ''
Loading
Loading