self-test disk test enhancements #20590

travisdowns · 2024-06-27T20:42:50Z

rpk: add additional disk self tests

Add 16K block size disk tests, a common block size written by Redpanda,
at varying IO depths: 1, 8 and 32 times the shard count (the
multiplication by the shard count happens in Redpanda and is
inevitable).

This will help better assess the performance of block storage which is
a bit outside the usual, in particular how it response to io depth
changes.

Additionally, add a 4K test which is the same as the existing one but
with dsync off. This is critical to assess the impact of fdatasync
on the storage layer: locally, for me on my consumer SSD this makes a
257x difference (!!) in throughput though the effect is much more muted,
perhaps close to zero on other SSD types.

On the redpanda side, when we complete a self test the API returns
info about the runincluding an info field which says "write run" currently
(for a disk test). Enhance this to include information about whether
dsync was enabled and the total io depth (which is the client-specified
parallelism value times the number of shards).

Backports Required

Release Notes

Improvements

Add more cases to the rpk disk self-test to better probe write performance at various IO depths, and at 16K block sizes. Return more information about the specifics of the test in the output.

StephanDollberg

This will slow down the test quite a bit but I guess that's not really a problem

StephanDollberg · 2024-06-27T21:05:49Z

src/go/rpk/pkg/cli/cluster/selftest/start.go

+			Type:        adminapi.DiskcheckTagIdentifier,
+		},
+		adminapi.DiskcheckParameters{
+			Name:        "16KB sequential r/w, high io depth",


Did you intentionally not add something like 4k @ 256 iodepth?

Are you asking more about "why not 4K" or "why not 256 iodepth"?

In any case it was intentional but open to ideas here. One thing to note is that the parallelism factor here is then multiplied by the shard count, so on modest 8 shard nodes we are already at a very high 512 io depth for parallelism=64, which IME is larger than what you need to get max throughput even on large local SSD configurations (though of course this may not be the case on some other storage configuations, especially high throughput, longer latency network attached storage).

I don't actually like this multiplication because it (a) adds a confounding factor when comparing results against different clusters which may have different shard counts (but at least now we see the effective iodepth in the output) and (b) it means you can't run an iodepth=1 test except on a cluster with 1-shard nodes.

About 4K vs 16K, my goal was to add a 16K test to see the difference between 4K and 16K, i.e., how much performance varies in the range of block sizes Redpanda is already writing with default settings. Then I also wanted to add a "series" of varying iodepth tests, which I sort of arbitrarily chose to be 16K one. I didn't want to do both to keep the number of tests down, and I think maybe I favored 16K over 4K in part because 4K already had parallelism=2, and I wanted 1 and didn't want to charge the existing 4K test to keep some continuity with old results.

That said, very open to changing it. What is your view on the ideal series of tests to run?

One thing to note is that the parallelism factor here is then multiplied by the shard count

Wait but right now this all happens on shard zero only. Are you saying we still multiply it by the shard count?

That said, very open to changing it. What is your view on the ideal series of tests to run?

I don't feel strongly. Just really coming from the classic 4k test and I guess it matches the min amount we write.

I guess the 512Kib test is actually the least relevant one for RP as we never write sizes bigger than 16Kib (only when fetching from TS).

Wait but right now this all happens on shard zero only. Are you saying we still multiply it by the shard count?

No I was simply mistaken. I thought this ran on all shards, but as you say it seems to run only one shard. I was thrown off especially by this comnent and also this code and comment. Perhaps vestigial?

So I will adjust the numbers to hit higher io depths, and maybe add 1 more test.

Just really coming from the classic 4k test and I guess it matches the min amount we write.

I'll change it to 4K.

I guess the 512Kib test is actually the least relevant one for RP as we never write sizes bigger than 16Kib (only when fetching from TS).

It's definitely the least useful for evaluating RP performance at the default settings. As a test to understand more about the disk, especially disks with characteristics different than the most common ones we run on I think it's fine because it is a "max throughput" test, and if it gets a much higher number than the other tests with small blocks then we've learned something.

travisdowns · 2024-06-28T20:32:42Z

This will slow down the test quite a bit but I guess that's not really a problem

If we want more data points and we want to keep the same duration per test, I don't really see an alternative to that. However, we could always reduce the default per-test duration if the overall current duration (2 minutes, at default per-test duration) is a "sweet spot" or somethign like that.

Note that most of these newly added tests have skipRead=true so they are half the time of the existing tests, so the time expansion is actually half of what you'd guess by looking at it. The increase is 4 tests -> 8 tests, so 2 minutes to 4 minutes at default duration.

travisdowns · 2024-06-28T20:33:11Z

Stupid "close with comment" button sitting there looking so pressable.

travisdowns · 2024-06-29T03:34:07Z

Updated in push: 0c1753b

Removed the io_depth() method and stopped assuming the parallelism was multiplied by the shard count.
Changed the "io depth" sequence from 16K to 4K, except for the iodepth=1 test, only write test is done. Kept one 16K r/w test at 64 io depth. The no dsync test is at 4K, 64 io depth.
Removed ", dsync" from the description of the 512k r/w test since it doesn't make sense for the "read" part.
Fixed tests that said r/w when they were actually only write.
Aligned --help output with these changes.

Example output after this change:

NODE ID: 0 | STATUS: IDLE
=========================
NAME        512KB sequential r/w
INFO        write run (iodepth: 4, dsync: true)
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1009ms
IOPS        425 req/sec
THROUGHPUT  212.5MiB/sec
LATENCY     P50     P90      P99      P999     MAX
            9215us  11775us  14847us  21503us  21503us

NAME        512KB sequential r/w
INFO        read run
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1000ms
IOPS        10147 req/sec
THROUGHPUT  4.955GiB/sec
LATENCY     P50    P90    P99    P999    MAX
            247us  639us  799us  1087us  1215us

NAME        4KB sequential r/w, low io depth
INFO        write run (iodepth: 1, dsync: true)
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1002ms
IOPS        414 req/sec
THROUGHPUT  1.617MiB/sec
LATENCY     P50     P90     P99     P999    MAX
            2431us  2559us  2687us  5887us  5887us

NAME        4KB sequential r/w, low io depth
INFO        read run
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1000ms
IOPS        621714 req/sec
THROUGHPUT  2.372GiB/sec
LATENCY     P50   P90   P99   P999  MAX
            1us   1us   2us   23us  543us

NAME        4KB sequential write, medium io depth
INFO        write run (iodepth: 8, dsync: true)
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1014ms
IOPS        523 req/sec
THROUGHPUT  2.043MiB/sec
LATENCY     P50      P90      P99      P999     MAX
            15871us  16383us  20479us  20479us  21503us

NAME        4KB sequential write, high io depth
INFO        write run (iodepth: 64, dsync: true)
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1115ms
IOPS        607 req/sec
THROUGHPUT  2.371MiB/sec
LATENCY     P50       P90       P99       P999      MAX
            118783us  126975us  139263us  139263us  180223us

NAME      4KB sequential write, very high io depth
TYPE      disk
TEST ID   931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS  0
DURATION  0ms
ERROR     IO Queue depth (parallelism) out of range, min is 1, max 256

NAME        4KB sequential write, no dsync
INFO        write run (iodepth: 64, dsync: false)
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1000ms
IOPS        366771 req/sec
THROUGHPUT  1.399GiB/sec
LATENCY     P50    P90    P99    P999   MAX
            167us  231us  303us  735us  1151us

NAME        16KB sequential r/w, high io depth
INFO        write run (iodepth: 64, dsync: false)
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1000ms
IOPS        195040 req/sec
THROUGHPUT  2.976GiB/sec
LATENCY     P50    P90    P99    P999   MAX
            319us  367us  431us  479us  543us

NAME        16KB sequential r/w, high io depth
INFO        read run
TYPE        disk
TEST ID     931e192d-2133-4304-b093-3586d18b0c56
TIMEOUTS    0
DURATION    1000ms
IOPS        197272 req/sec
THROUGHPUT  3.01GiB/sec
LATENCY     P50    P90    P99    P999   MAX
            335us  367us  463us  639us  1023us

The help output:

Starts one or more benchmark tests on one or more nodes
of the cluster. Available tests to run:

* Disk tests:
  * Throughput test: 512 KB messages, sequential read/write
    * Uses a larger request message sizes and deeper I/O queue depth to write/read more bytes in a shorter amount of time, at the cost of IOPS/latency.
  * Latency and io depth tests: 4 KB messages, sequential read/write, varying io depth
    * Uses small IO sizes and varying levels of parallelism to determine the relationship between io depth and IOPS
        * Includes one test without using dsync (fdatasync) on each write to establish the cost of dsync
  * 16 KB test
    * One high io depth test at 16 KB to reflect performance at Redpanda's default chunk size

travisdowns · 2024-07-01T02:47:15Z

/dt

travisdowns · 2024-07-02T02:54:39Z

/dt

travisdowns · 2024-07-02T19:35:44Z

/ci-repeat 1

kbatuigas

Minor copy edits for consistency with public docs

src/go/rpk/pkg/cli/cluster/selftest/start.go

travisdowns · 2024-07-05T03:25:03Z

/ci-repeat 1

vbotbuildovich · 2024-07-05T05:32:39Z

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812c-f3ce-4e04-840f-426fdcd3fac9:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812c-f3cf-4a1d-97e8-aec9c71db760:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812c-f3d1-4683-8a5f-77831d2deecd:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812c-f3cc-455c-b0d2-212dcdab44f1:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812e-d95c-486b-9d4d-89b31bda8c5b:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812e-d95e-4eb2-b6b1-7dd3881feba2:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812e-d957-4a96-b015-473add4dc93b:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51128#0190812e-d959-46e9-b67c-0e9571b17b33:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51441#0190a8a4-3b07-4103-86be-2e71180e4479:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51441#0190a8a4-3b05-4dc7-9d4b-4878bd5eb84b:
pandatriage cache was not found

skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51441#0190a8bc-745d-4058-9d93-e84bc9906f5d:
pandatriage cache was not found

travisdowns · 2024-07-08T15:46:09Z

OK these remaining errors seem legit, looking.

travisdowns · 2024-07-11T17:07:58Z

365233c is a pure rebase.

f01195a changes the max iodepth in the new tests to 256 from 512, as RP has a hardcoded limit of 256 in the self test code. I also considered increasing this limit from 256 to 512 on the RP side but then we'd have issues running self-test in cases where the RPK version was newer than the Redpanda version, which is a supported and I think fairly common scenario, so I decided to change RPK instead.

This should fix the test failures.

vbotbuildovich · 2024-07-11T19:17:57Z

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08a6-4d35-b300-695b00ac2af8:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=False.remote_write=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08a8-4562-a22d-e76610db099a:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=False.remote_write=True"
"rptest.tests.self_test_test.SelfTestTest.test_self_test_node_crash"

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08aa-43eb-8b38-c266ab6f395f:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=True.remote_write=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08ac-4d5b-81cd-1ebe01054b59:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=True.remote_write=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-bacc-47d3-ab43-fa4842e939fd:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=False.remote_write=True"
"rptest.tests.self_test_test.SelfTestTest.test_self_test_node_crash"

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-baca-43b9-b035-d624e18befca:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=False.remote_write=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-bac8-4f3f-b1d7-0a027f3a1d46:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=True.remote_write=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-bace-4ce7-865c-f95725cb07e4:

"rptest.tests.self_test_test.SelfTestTest.test_self_test.remote_read=True.remote_write=False"

travisdowns · 2024-07-12T19:22:23Z

Hopefully this last push fixes all the failures. All the tests were passing for me locally but it turned out it just because of https://github.com/redpanda-data/vtools/pull/2950 not rebuilding my RPK.

travisdowns · 2024-07-15T14:03:22Z

All spurious failures, retrying.

vbotbuildovich · 2024-07-15T15:23:05Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51441#0190b6b2-8b1b-47d6-ac1c-278a444551c1

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51532#0190b7dd-5b51-42bf-a583-11a55136488d

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51669#0190c1d0-6076-404d-a63a-326257de411d

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51730#0190c736-8f0e-4639-8b72-755d2feabab0

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51755#0190c8f2-4424-407b-80f1-45c68a49dd59

travisdowns · 2024-07-15T16:43:31Z

/ci-repeat 1

travisdowns · 2024-07-15T18:13:00Z

/ci-repeat 1

travisdowns · 2024-07-15T18:14:19Z

Spurious GH download failure in last run.

travisdowns · 2024-07-17T16:35:50Z

/ci-repeat 1

travisdowns · 2024-07-18T14:48:15Z

Last failure was a merge conflict, fixed. Hopefully this CI run is the one.

Add 16K block size disk tests, a common block size written by Redpanda, at varying IO depths: 1, 8 and 32 times the shard count (the multiplication by the shard count happens in Redpanda and is inevitable). This will help better assess the performance of block storage which is a bit outside the usual, in particular how it response to io depth changes. Additionally, add a 4K test which is the same as the existing one but with dsync off. This is critical to assess the impact of fdatasync on the storage layer: locally, this makes a 257x difference in throughput though the effect is much more muted, perhaps close to zero on other SSD types. Rename slightly the tests to remove extraneous info. Issue redpanda-data/core-internal#1266.

Set the name to unspecified, which is more accurate reflection of the situation when the caller doesn't set a name. Fix a comment which said 1G but was 10G.

When we complete a self test the API returns info about the run including an info field which says "write run" currently (for a disk test). Enhance this to include information about whether dsync was enabled and the total io depth. Issue redpanda-data/core-internal#1266.

travisdowns · 2024-07-19T01:46:11Z

bd8b94d is to fix yet another merge conflict (what's up with my luck on this change?).

travisdowns requested review from twmb, r-vasquez, gene-redpanda and Deflaimun as code owners June 27, 2024 20:42

travisdowns requested a review from StephanDollberg June 27, 2024 20:43

github-actions bot added area/rpk area/redpanda labels Jun 27, 2024

StephanDollberg previously approved these changes Jun 27, 2024

View reviewed changes

r-vasquez previously approved these changes Jun 27, 2024

View reviewed changes

travisdowns closed this Jun 28, 2024

travisdowns reopened this Jun 28, 2024

travisdowns requested a review from StephanDollberg June 28, 2024 20:45

travisdowns dismissed stale reviews from r-vasquez and StephanDollberg via 0c1753b June 29, 2024 03:28

travisdowns force-pushed the td-rpk-self-test-disk branch from b1f5213 to 0c1753b Compare June 29, 2024 03:28

StephanDollberg previously approved these changes Jun 29, 2024

View reviewed changes

travisdowns dismissed StephanDollberg’s stale review via fa3908b July 2, 2024 19:40

travisdowns force-pushed the td-rpk-self-test-disk branch from 0c1753b to fa3908b Compare July 2, 2024 19:40

StephanDollberg previously approved these changes Jul 2, 2024

View reviewed changes

dotnwat previously approved these changes Jul 2, 2024

View reviewed changes

r-vasquez previously approved these changes Jul 2, 2024

View reviewed changes

kbatuigas previously approved these changes Jul 3, 2024

View reviewed changes

src/go/rpk/pkg/cli/cluster/selftest/start.go Outdated Show resolved Hide resolved

src/go/rpk/pkg/cli/cluster/selftest/start.go Outdated Show resolved Hide resolved

src/go/rpk/pkg/cli/cluster/selftest/start.go Outdated Show resolved Hide resolved

travisdowns dismissed stale reviews from kbatuigas and r-vasquez via 5944298 July 4, 2024 04:02

travisdowns dismissed StephanDollberg’s stale review via 365233c July 11, 2024 16:59

travisdowns force-pushed the td-rpk-self-test-disk branch 2 times, most recently from 365233c to f01195a Compare July 11, 2024 17:06

StephanDollberg previously approved these changes Jul 11, 2024

View reviewed changes

travisdowns dismissed StephanDollberg’s stale review via 6e11ce0 July 12, 2024 18:23

travisdowns force-pushed the td-rpk-self-test-disk branch 2 times, most recently from 6e11ce0 to 46e90f0 Compare July 12, 2024 19:21

StephanDollberg previously approved these changes Jul 12, 2024

View reviewed changes

travisdowns dismissed StephanDollberg’s stale review via a747e47 July 18, 2024 14:47

travisdowns force-pushed the td-rpk-self-test-disk branch from 46e90f0 to a747e47 Compare July 18, 2024 14:47

travisdowns added 3 commits July 18, 2024 21:44

self-test: misc fixes

10841fd

Set the name to unspecified, which is more accurate reflection of the situation when the caller doesn't set a name. Fix a comment which said 1G but was 10G.

travisdowns force-pushed the td-rpk-self-test-disk branch from a747e47 to bd8b94d Compare July 19, 2024 01:45

StephanDollberg approved these changes Jul 19, 2024

View reviewed changes

travisdowns merged commit 080ac33 into redpanda-data:dev Jul 19, 2024
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

self-test disk test enhancements #20590

self-test disk test enhancements #20590

travisdowns commented Jun 27, 2024 •

edited

Loading

StephanDollberg left a comment

StephanDollberg Jun 27, 2024

travisdowns Jun 28, 2024 •

edited

Loading

StephanDollberg Jun 28, 2024

travisdowns Jun 29, 2024 •

edited

Loading

travisdowns commented Jun 28, 2024 •

edited

Loading

travisdowns commented Jun 28, 2024

travisdowns commented Jun 29, 2024

travisdowns commented Jul 1, 2024

travisdowns commented Jul 2, 2024

travisdowns commented Jul 2, 2024

kbatuigas left a comment

travisdowns commented Jul 5, 2024

vbotbuildovich commented Jul 5, 2024 •

edited

Loading

travisdowns commented Jul 8, 2024

travisdowns commented Jul 11, 2024

vbotbuildovich commented Jul 11, 2024 •

edited

Loading

travisdowns commented Jul 12, 2024

travisdowns commented Jul 15, 2024

vbotbuildovich commented Jul 15, 2024 •

edited

Loading

travisdowns commented Jul 15, 2024

travisdowns commented Jul 15, 2024

travisdowns commented Jul 15, 2024

travisdowns commented Jul 17, 2024

travisdowns commented Jul 18, 2024

travisdowns commented Jul 19, 2024

self-test disk test enhancements #20590

self-test disk test enhancements #20590

Conversation

travisdowns commented Jun 27, 2024 • edited Loading

Backports Required

Release Notes

Improvements

StephanDollberg left a comment

Choose a reason for hiding this comment

StephanDollberg Jun 27, 2024

Choose a reason for hiding this comment

travisdowns Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

StephanDollberg Jun 28, 2024

Choose a reason for hiding this comment

travisdowns Jun 29, 2024 • edited Loading

Choose a reason for hiding this comment

travisdowns commented Jun 28, 2024 • edited Loading

travisdowns commented Jun 28, 2024

travisdowns commented Jun 29, 2024

travisdowns commented Jul 1, 2024

travisdowns commented Jul 2, 2024

travisdowns commented Jul 2, 2024

kbatuigas left a comment

Choose a reason for hiding this comment

travisdowns commented Jul 5, 2024

vbotbuildovich commented Jul 5, 2024 • edited Loading

travisdowns commented Jul 8, 2024

travisdowns commented Jul 11, 2024

vbotbuildovich commented Jul 11, 2024 • edited Loading

travisdowns commented Jul 12, 2024

travisdowns commented Jul 15, 2024

vbotbuildovich commented Jul 15, 2024 • edited Loading

travisdowns commented Jul 15, 2024

travisdowns commented Jul 15, 2024

travisdowns commented Jul 15, 2024

travisdowns commented Jul 17, 2024

travisdowns commented Jul 18, 2024

travisdowns commented Jul 19, 2024

travisdowns commented Jun 27, 2024 •

edited

Loading

travisdowns Jun 28, 2024 •

edited

Loading

travisdowns Jun 29, 2024 •

edited

Loading

travisdowns commented Jun 28, 2024 •

edited

Loading

vbotbuildovich commented Jul 5, 2024 •

edited

Loading

vbotbuildovich commented Jul 11, 2024 •

edited

Loading

vbotbuildovich commented Jul 15, 2024 •

edited

Loading