Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(elasticity): Added a test of 90% utilization with a lot of small tables #9785

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yarongilor
Copy link
Contributor

@yarongilor yarongilor commented Jan 12, 2025

This is a test case for having 90% disk utilization with a lot of small tables.
The data is split equally among 500 tables.
The dataset size is aligned with 'i4i.xlarge'.
It uses a c-s user-profile template for all 500 tables.
It runs 4 batches of 125 tables each.
On each batch cycle, 125 tables are created, then a load is generated for all of these tables.
When all the 125 stress writes/reads are done, it continue with the next batch until stress to all 500 tables is completed (after 4 cycles).
Each one of the 500 tables has both write and read load.
Closes #9309

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@yarongilor
Copy link
Contributor Author

Keeping this PR as draft since not sure it worth merging.

@pehala
Copy link
Contributor

pehala commented Jan 12, 2025

Could you elaborate more on why you need to make so many changes to decorators?

@yarongilor
Copy link
Contributor Author

Could you elaborate more on why you need to make so many changes to decorators?

It's an ad hoc variation of a latency decorator. It's not a must as well and can be removed.

@pehala
Copy link
Contributor

pehala commented Jan 13, 2025

It's an ad hoc variation of a latency decorator. It's not a must as well and can be removed.

If we need to make only small-to-none changes to the decorator, I think this is worth merging

@yarongilor
Copy link
Contributor Author

It's an ad hoc variation of a latency decorator. It's not a must as well and can be removed.

If we need to make only small-to-none changes to the decorator, I think this is worth merging

If we remove this new decorator we still get all needed latency results by Grafana.
Otherwise, we'll have to refactor the original latency decorator so it'll be more extensive code change and testing.

@pehala
Copy link
Contributor

pehala commented Jan 13, 2025

If we remove this new decorator we still get all needed latency results by Grafana.

I think for this testcase it will be enough (Even though I was pushing "lets use the decorator" before), it is a weird testcase that we are not yet prepared for and we cannot call it a performance test regardless due to methodology changes. I add a configuration file and in it I would document shortcomings for this test case and then we can merge safely

@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch 2 times, most recently from ffb43a1 to 58d3702 Compare January 13, 2025 11:23
@pehala
Copy link
Contributor

pehala commented Jan 13, 2025

Any reason why it is still in draft?

@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from 58d3702 to 245e5d2 Compare January 13, 2025 11:36
@yarongilor yarongilor marked this pull request as ready for review January 13, 2025 11:36
@yarongilor
Copy link
Contributor Author

Any reason why it is still in draft?

ready for review now

@yarongilor yarongilor requested a review from pehala January 13, 2025 11:37
@yarongilor yarongilor added Ready for review Ready to be Merged area/elastic cloud Issues related to the elastic cloud project backport/none Backport is not required labels Jan 13, 2025
@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch 2 times, most recently from 7fa68dc to 2f70af1 Compare January 13, 2025 12:19
@yarongilor yarongilor requested a review from pehala January 13, 2025 12:22
@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch 2 times, most recently from e601f9a to 958a02c Compare January 13, 2025 15:12
@yarongilor
Copy link
Contributor Author

@pehala , all checks passed and comments addressed.

pehala
pehala previously approved these changes Jan 13, 2025
@pehala pehala requested a review from a team January 13, 2025 21:07
Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test in argus is failing - timeout is reached and from graphs I see it's more than 90% of disk utilization - is this an Scylla issue? If yes, please add reference to it.
Can you clear out what is tested here - possibility of creating 500 tables? Or the tables to have 1 tablet per table (as it's small)?

user_profile_table_count: 501
batch_size: 125

n_loaders: 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need 5 big loaders against 3 smaller db nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess cause every cycle runs 125 stress commands at the same time,
I would be nice to mention it here

Also I would recommend cross checking if round robin is needed here, otherwise we might be running 5 x 125 command in each cycle...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, one loader possibly could handle all of this - 30kops is not that much. If we need 5, then smaller ones would suffice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. it runs 125 per cycle. round robin is not relevant here.
  2. It still has a problem that the cycles, from some reason, continues, so in last test in got to more than 600 stress threads since a 5th cycle started.
  3. @soyacz , it's not a matter of ops but rather the number of parallel stress threads. When i tried, for example, 250 threads per cycle, the test crashed (probably loader issues like OOM or similar).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that round robin is needed here.

You are creating 5 of each command, which is not what you really need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fruch , it is explicitly set here to True:

batch_params = dict(duration=duration, round_robin=True, stress_cmd=[])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One C-S command, during my testing, was taking about 1.3Gb of RAM.
So, you can calculate how many C-S commands may fit each loader node of a specific instance type.

And after it you can remove redundant ones if any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this calculation is right.
In this case there is 5 * 16 GB memory = 80GB total memory for the loaders.
But according to above memory requirements: 125 threads * 1.3GB = 162.5GB, so according to above calculation the test should have failed with OOM.
@pehala , i think this consideration might be negligible since this test may only be run once a year or so, so it may spare a cost of ~ 5$ a year or so.
(The grafana shows 3 loaders had an available memory of 5%)
Nonetheless i can try running with only 4 loaders and see how it goes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and passed ok with 4 loaders.

data_dir/templated-elasticity-tables.yaml Outdated Show resolved Hide resolved
fields: samerow

# Run stress
# cassandra-stress user profile={} 'ops(insert=1)' cl=QUORUM n=2572262 -rate threads=1 -errors ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document why -errors ignore is used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from d49eac1 to 6fca49c Compare January 14, 2025 09:07
@yarongilor
Copy link
Contributor Author

Test in argus is failing - timeout is reached and from graphs I see it's more than 90% of disk utilization - is this an Scylla issue? If yes, please add reference to it. Can you clear out what is tested here - possibility of creating 500 tables? Or the tables to have 1 tablet per table (as it's small)?

This was a debug run for debug testing, This branch wasn't originally meant to be merged. I started another test to see it runs as expected.
As for what's tested, please see this PR description.

@yarongilor
Copy link
Contributor Author

@pehala , the test basically accomplished the task of running 500 tables workload with 90% utilization.
yet there are few errors and issues, probably inherited from the original 5000-tables scale test.
please advice if it's good enough as is or we like to resolve all issues and errors.

@pehala
Copy link
Contributor

pehala commented Jan 15, 2025

@pehala , the test basically accomplished the task of running 500 tables workload with 90% utilization.
yet there are few errors and issues, probably inherited from the original 5000-tables scale test.
please advice if it's good enough as is or we like to resolve all issues and errors.

They for sure need to resolved

… tables

Test splitting the 90% utilization among a lot of small tables.
@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch 2 times, most recently from 6025504 to a5c93b9 Compare January 21, 2025 16:31
@yarongilor
Copy link
Contributor Author

yarongilor commented Jan 21, 2025

@pehala , i added a new commit with some rewrites. It should allow supporting both scenarios of 5000 tables and 500 tables with less confusions.
test passed ok.

@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from a5c93b9 to 348ad37 Compare January 22, 2025 07:33
@yarongilor yarongilor requested a review from pehala January 22, 2025 17:03
@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from 348ad37 to 06e0b4e Compare January 22, 2025 17:08
@yarongilor yarongilor requested review from soyacz and fruch January 22, 2025 17:11
@@ -8,6 +8,4 @@ longevityPipeline(
region: 'eu-west-1',
test_name: 'longevity_test.LongevityTest.test_user_batch_custom_time',
test_config: '''["test-cases/scale/longevity-5000-tables.yaml", "configurations/raft/enable_raft_experimental.yaml"]''',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this change related to this PR ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, looks unrelated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, i thought it looks related to #9785 (comment), but nevermind, removed.

user_profile_table_count: 501
batch_size: 125

n_loaders: 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that round robin is needed here.

You are creating 5 of each command, which is not what you really need.

@fruch fruch requested a review from vponomaryov January 22, 2025 17:45
@fruch
Copy link
Contributor

fruch commented Jan 22, 2025

@vponomaryov you recently optimized the 5000 tables in one keyspace that uses the machinery touched here

Can you please review it ?

Copy link
Contributor

@vponomaryov vponomaryov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarongilor

Your referenced test job has 8 batches with 62 stress commands.
But it's config has batch_size: 125.

So, you have some bug in this scope.

@@ -8,6 +8,4 @@ longevityPipeline(
region: 'eu-west-1',
test_name: 'longevity_test.LongevityTest.test_user_batch_custom_time',
test_config: '''["test-cases/scale/longevity-5000-tables.yaml", "configurations/raft/enable_raft_experimental.yaml"]''',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, looks unrelated.

@@ -0,0 +1,34 @@
# This is a test case for having 90% disk utilization with a lot of small tables.
# The data is split equally among 500 tables.
# The dataset size is aligned with 'i4i.xlarge'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too ambiguous statement.
Number of nodes and RF also matters to understand the final data size.

Copy link
Contributor Author

@yarongilor yarongilor Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed:
# The dataset size is aligned with 'i4i.xlarge' and RF = 3 = number-of-DB-nodes.

user_profile_table_count: 501
batch_size: 125

n_loaders: 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One C-S command, during my testing, was taking about 1.3Gb of RAM.
So, you can calculate how many C-S commands may fit each loader node of a specific instance type.

And after it you can remove redundant ones if any.

user_prefix: 'longevity-elasticity-many-small-tables'
root_disk_size_runner: 120


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to consider following changes:

root_disk_size_monitor: 120
append_scylla_yaml:                                                                                    
  # NOTE: https://github.com/scylladb/scylla-monitoring/issues/2429                                    
  enable_node_aggregated_table_metrics: false 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added


user_profile_table_count = self.params.get('user_profile_table_count') # pylint: disable=invalid-name
# Creating extra table during batch only for a high number of tables (>500).
create_extra_tables = True if int(user_profile_table_count) > 500 else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks strange.
Why exactly 500? What if later we will have 400 and 800 cases?

So, need either make it be configurable or describe the reasoning for it here.
Also, coding of this particular condition can be simplified to the following:

create_extra_tables = int(user_profile_table_count) > 500

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i don't really know the idea of adding extra table, So i didn't want to impact existing code flow.
@fruch , can you please advise why the need for an extra table? or perhaps it's not needed anymore and can be removed in a followup fix? would adding a parameter for it is reasonable or an overkill?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original 5000 table case, we created all 5000 tables upfront

we wanted few to be created in run time, via c-s as well

In each cycle we added 4 of them, I don't know how it became just 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes if you want to take it out for some tests, it should be configurable.

No one's gonna remember what's this logic you did here in two weeks

Copy link
Contributor Author

@yarongilor yarongilor Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, @fruch , fixed by:
image

except ValueError:
raise ValueError(f"Invalid cs_duration format: {cs_duration}")

for cs_profile in customer_profiles:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your new case it is not customer profile.
So, may be renamed to, for example, cs_profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to cs_user_profiles

@yarongilor
Copy link
Contributor Author

@yarongilor

Your referenced test job has 8 batches with 62 stress commands. But it's config has batch_size: 125.

So, you have some bug in this scope.

@vponomaryov , i don't see any bug according to the log:

< t:2025-01-22 08:00:45,556 f:longevity_test.py l:320  c:LongevityTest        p:DEBUG > Starting stress in batches of: 125 with 1000 stress commands
$ grep -i "(CassandraStressEvent Severity.NORMAL) period_type=begin"  -c sct-a22f9c23.log
1000

@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from 06e0b4e to 47da586 Compare January 23, 2025 10:12
@vponomaryov
Copy link
Contributor

@yarongilor
Your referenced test job has 8 batches with 62 stress commands. But it's config has batch_size: 125.
So, you have some bug in this scope.

@vponomaryov , i don't see any bug according to the log:

< t:2025-01-22 08:00:45,556 f:longevity_test.py l:320  c:LongevityTest        p:DEBUG > Starting stress in batches of: 125 with 1000 stress commands
$ grep -i "(CassandraStressEvent Severity.NORMAL) period_type=begin"  -c sct-a22f9c23.log
1000

Argus link from the PR description: https://argus.scylladb.com/tests/scylla-cluster-tests/a22f9c23-6289-497d-8202-db2b4ba85d2f
Jenkins for it: https://jenkins.scylladb.com/job/scylla-staging/job/yarongilor/job/elasticity-many-small-tables-test/25/consoleFull

Part of log:

2025-01-22 07:45:15,776 f:sct_config.py   l:2200 c:sdcm.sct_config      p:INFO  > batch_size: 125
...
2025-01-22 07:45:15,776 f:sct_config.py   l:2200 c:sdcm.sct_config      p:INFO  > user_profile_table_count: 500
...
2025-01-22 08:00:45,556 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 1 out of 8
...
2025-01-22 09:00:26,768 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 2 out of 8
...
2025-01-22 09:59:11,281 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 3 out of 8
...
2025-01-22 11:00:19,811 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 4 out of 8
...
2025-01-22 11:59:59,305 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 5 out of 8
...
2025-01-22 12:59:51,577 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 6 out of 8
...
2025-01-22 13:59:18,718 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 7 out of 8
...
2025-01-22 14:59:29,130 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 8 out of 8

The 500 / 125 must give us 4 sets, not 8.

@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from 47da586 to 7449ba7 Compare January 23, 2025 12:38
@yarongilor
Copy link
Contributor Author

@yarongilor
Your referenced test job has 8 batches with 62 stress commands. But it's config has batch_size: 125.
So, you have some bug in this scope.

@vponomaryov , i don't see any bug according to the log:

< t:2025-01-22 08:00:45,556 f:longevity_test.py l:320  c:LongevityTest        p:DEBUG > Starting stress in batches of: 125 with 1000 stress commands
$ grep -i "(CassandraStressEvent Severity.NORMAL) period_type=begin"  -c sct-a22f9c23.log
1000

Argus link from the PR description: https://argus.scylladb.com/tests/scylla-cluster-tests/a22f9c23-6289-497d-8202-db2b4ba85d2f Jenkins for it: https://jenkins.scylladb.com/job/scylla-staging/job/yarongilor/job/elasticity-many-small-tables-test/25/consoleFull

Part of log:

2025-01-22 07:45:15,776 f:sct_config.py   l:2200 c:sdcm.sct_config      p:INFO  > batch_size: 125
...
2025-01-22 07:45:15,776 f:sct_config.py   l:2200 c:sdcm.sct_config      p:INFO  > user_profile_table_count: 500
...
2025-01-22 08:00:45,556 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 1 out of 8
...
2025-01-22 09:00:26,768 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 2 out of 8
...
2025-01-22 09:59:11,281 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 3 out of 8
...
2025-01-22 11:00:19,811 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 4 out of 8
...
2025-01-22 11:59:59,305 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 5 out of 8
...
2025-01-22 12:59:51,577 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 6 out of 8
...
2025-01-22 13:59:18,718 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 7 out of 8
...
2025-01-22 14:59:29,130 f:longevity_test.py l:347  c:LongevityTest        p:INFO  > Starting batch 8 out of 8

The 500 / 125 must give us 4 sets, not 8.

@vponomaryov , the loader memory doesn't care how many tables there are. It cares about c-s threads memory consumptin. So there are 1000 stresses in 8 cycles of 125 threads each. It's 2 c-s stresses per table.

@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch 2 times, most recently from f5a2b5e to d613b1a Compare January 23, 2025 15:22
@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from d613b1a to e377881 Compare January 23, 2025 17:35
…% utilization scenario

The test of test_user_batch_custom_time is fixed and improved in order to run both 5000 tables scenario
and a scenario like 500 tables for testing 90% utilization with many small tables.
@yarongilor yarongilor force-pushed the elasticity_small_tables2 branch from e377881 to 459b90b Compare January 26, 2025 08:03
@scylladbbot
Copy link

@yarongilor new branch branch-2025.1 was added, please add backport label if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/elastic cloud Issues related to the elastic cloud project backport/none Backport is not required Ready for review Ready to be Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create testcase for having 90% utilization with a lot of small tables
6 participants