-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Manager snapshot preparer helper #9837
base: master
Are you sure you want to change the base?
Improve Manager snapshot preparer helper #9837
Conversation
New approach makes snapshots preparation process is more flexible, allowing to vary the whole scope of c-s write cmd parameters - full list of them is provided in manager_snapshots_preparer_config.yaml. Default values of config yaml parameters can be rewritten by passing SCT_MGMT_SNAPSHOTS_PREPARER_PARAMS parameter in Jenkins job.
The process of snapshot generation is improved in a way that dataset params (which defined in c-s cmd) aren't hardcoded anymore what gives an opportunity to widely vary its properties (replication, compaction, cl, rf, etc). manager_restore_benchmark_snapshots.yaml file with snapshots config for current benchmark tests is extended to include such a parameters. These params are later used on C-S read command generation stage. Thus, snapshot preparation test becomes widely customizable with more flexibility to adjust dataset.
f587573
to
de09454
Compare
de09454
to
b5c0adf
Compare
Extension for test_prepare_backup_snapshot that allows to run the test either for standard cluster created via SCT or for clusters created in via siren-tests (Cloud cluster). Changes: - backup location: for cloud cluster it should be taken from default automatically scheduled by Manager backup task. - copy s3 bucket for cloud cluster since the original bucket is deleted together with cluster while cluster removal.
To improve usability of the snapshots preparer tool, generated backup snapshot details will be sent to Argus Results, where they can be taken. It eliminates the need to browse the logfile looking for details.
b5c0adf
to
58b8047
Compare
@scylladb/qa-maintainers |
cl=cs_cmd_params.get("cl").lower(), | ||
col_size=cs_cmd_params.get("col_size"), | ||
col_n=cs_cmd_params.get("col_n"), | ||
scylla_version=re.sub(r"[-.]", "_", self.params.get("scylla_version")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong way for getting scylla version.
Using AMI this config option will have None
value.
Or it may have latest
value.
The get_version_based_on_conf
existing method should be used instead:
https://github.com/scylladb/scylla-cluster-tests/blame/8f2162aa/sdcm/sct_config.py#L2708
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I got it.
get_version_based_on_conf
is called as part on init_and_verify_sct_config
where it sets self.scylla_version
which I'm trying to retrieve in this code.
Or you mean that get_version_based_on_conf
should be called once again:
scylla_version = self.params.get_version_based_on_conf()[0]
because during the very first call in init_and_verify_sct_config
it might be missing some information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
param.get('scylla_version')
vs param.scylla_version
first is read from what set in configuration
The second is the computed on you mentioned
I agree it's a bit confusing, in PR I was trying to refactor sct_config.oy into using pydantic I was changing the name (but this PR is more complicated then I assumed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I got it.
get_version_based_on_conf
is called as part oninit_and_verify_sct_config
where it setsself.scylla_version
which I'm trying to retrieve in this code.Or you mean that
get_version_based_on_conf
should be called once again:scylla_version = self.params.get_version_based_on_conf()[0]
because during the very first call in
init_and_verify_sct_config
it might be missing some information?
Calling param.get('scylla_version')
you read the config option value as-is. Not the calculated one.
The scylla_version = self.params.get_version_based_on_conf()[0]
is the calculated one, considers multiple cases.
So, I recommend using this one ^.
@@ -1,110 +1,164 @@ | |||
bucket: "manager-backup-tests-permanent-snapshots-us-east-1" | |||
cs_read_cmd_template: "cassandra-stress read cl=ONE n={num_of_rows} -schema 'keyspace={keyspace_name} replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=500 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq={sequence_start}..{sequence_end}" | |||
cs_read_cmd_template: "cassandra-stress read cl={cl} n={num_of_rows} -schema 'keyspace={keyspace_name} replication(strategy={replication},replication_factor={rf}) compaction(strategy={compaction})' -mode cql3 native -rate threads=500 -col 'size=FIXED({col_size}) n=FIXED({col_n})' -pop seq={sequence_start}..{sequence_end}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why threads=500?
The number of threads must be dependent on the loader CPU cores.
Assuming common loader has 8 CPU cores and using 500 threads I expect it to be not efficient enough due to context switches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What approximations/rules can be used to define the optimal number of threads depending on loader CPU cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In common case (not related to C-S) it should be like we do it for scylla -> n_cpus - 1
per each 8 CPU cores.
Having 1-to-1 relationship between CPU core and thread allows to avoid context switches (heavy ops).
Having 1-to-many relationship between CPU core and thread guarantees CPU context switches.
It is pretty possible that some app may perform faster using single thread than 2 multi-threads.
So, it is matter of implementation and measurement to answer your question.
LOCALRUNNER.run(f"aws s3 mb s3://{name} --region {region}") | ||
|
||
@staticmethod | ||
def create_s3_bucket_gce(name: str, region: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3
is amazon-specific service name.
In case of Google it is called google storage
.
So, I think it is better to remove the terminology confusion here.
LOCALRUNNER.run(f"aws s3 sync s3://{source} s3://{destination} --acl {acl}") | ||
|
||
@staticmethod | ||
def sync_s3_buckets_gce(source: str, destination: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@mikliapko new branch |
refs:
scylladb/scylla-manager#4190
scylladb/scylla-manager#4208
The PR addresses two main topics:
manager_restore_benchmark_snapshots.yaml file with snapshots config for current benchmark tests is extended to include such a parameters. These params are later used on C-S read command generation stage.
Thus, snapshot preparation test becomes widely customizable with more flexibility to adjust dataset.
Testing
PR pre-checks (self review)
backport
labels