Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark: Update startup benchmarks #29767

Merged
merged 3 commits into from
Oct 4, 2024

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Sep 27, 2024

This commit updates the StartupLoaded feature benchmarks to create
a greater number of complex objects. We've noticed a significant amount
of time during startup is spent in optimization, so we want to make
sure that this benchmark captures that.

Works towards resolving #MaterializeInc/database-issues/8384

Motivation

Improves test

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@jkosh44 jkosh44 changed the title WIP improve startup benchmark benchmark: Update startup benchmarks Oct 3, 2024
@jkosh44 jkosh44 force-pushed the startup-benchmark branch 7 times, most recently from 6f99f25 to 9d5ad9d Compare October 4, 2024 15:36
This commit updates the `StartupLoaded` feature benchmarks to create
a greater number of complex objects. We've noticed a significant amount
of time during startup is spent in optimization, so we want to make
sure that this benchmark captures that.

Works towards resolving #MaterializeInc/database-issues/8384

create_tpch_tables = "\n".join(
f"""
> {newline.sub(" ", comment.sub("", ddl))}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a long time to come up with the right regex magic for TD to accept these queries. If there's a better way then please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally testdrive is not great for programmatically generating, I might have just pasted the queries in directly (if they are not too huge).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's roughly 1000 lines of SQL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, then makes sense to do it this way.

@jkosh44 jkosh44 requested a review from a team October 4, 2024 15:40
@jkosh44 jkosh44 marked this pull request as ready for review October 4, 2024 15:41
Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes the scenario much more interesting.

You could argue that we might want to keep the old version and add this as a new one, but I'm ok with not having too many scenarios, so fine by me.

@@ -1833,7 +1835,7 @@ def benchmark(self) -> BenchmarkingSequence:
class StartupLoaded(Scenario):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should bump the version of this scenario because it's no longer comparable to old results:

def version(self) -> ScenarioVersion:
        return ScenarioVersion.create(1, 1, 0)

@nrainer-materialize I think we should just make it a class variable instead of a method, but can be done independent of this PR, unless you had a good reason for this design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up splittin out the TPC-H stuff into a new benchmark, so I think I don't need to update the version anymore, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkosh44: That's right.

Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine now, no version upgrade needed


create_tpch_tables = "\n".join(
f"""
> {newline.sub(" ", comment.sub("", ddl))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, then makes sense to do it this way.

@jkosh44 jkosh44 merged commit 010bfbf into MaterializeInc:main Oct 4, 2024
15 of 18 checks passed
@jkosh44 jkosh44 deleted the startup-benchmark branch October 4, 2024 20:16
@jkosh44
Copy link
Contributor Author

jkosh44 commented Oct 7, 2024

@def- FYI, according to the Nightly run of this PR, there's a regression in the StartupTpch workload, https://buildkite.com/materialize/nightly/builds/9824#0192589f-e35b-4f92-b798-63fc890e702d. I don't think that makes much sense because this PR is what added the StartupTpch workload. There may be something weird going on there.

@def-
Copy link
Contributor

def- commented Oct 7, 2024

It does make sense to me. The way that we run feature-benchmark to check for regressions uses the new state of test code to compare the old Materialize product against the new Materialize product. But we never use the old state of test code.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Oct 7, 2024

So basically, when comparing 6bfdb4e against this branch, there actually is a regression in startup times (accoring to StartupTpch)?

@def-
Copy link
Contributor

def- commented Oct 7, 2024

Yes, assuming the scenario works reliably. Since there were no product changes, maybe the scenario is too flaky to be useful.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Oct 7, 2024

Since there were no product changes

How did you reach that conclusion? Was 6bfdb4e the commit directly before this one?

@def-
Copy link
Contributor

def- commented Oct 7, 2024

Yes, in PRs we always compare against the last state that was in main.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Oct 7, 2024

Ah ok, I'm curious to see how it looks after a couple of days and if it's always flaky then I'm OK removing it.

@nrainer-materialize
Copy link
Contributor

The build's error annotations show the comparison target.

For example:

New regression against v0.120.0-dev.0--main.g6bfdb4e0f9ab5476ca638fcf37827588cc40dbe9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants