Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

legacy-upgrade test: Always bump deploy generation #30580

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

def-
Copy link
Contributor

@def- def- commented Nov 20, 2024

Noticed in #30570

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@def- def- force-pushed the pr-bump-deploy branch 2 times, most recently from 6560090 to b141744 Compare November 20, 2024 15:20
@def-
Copy link
Contributor Author

def- commented Nov 20, 2024

@jkosh44 Is it supposed to work like this? Locally I noticed that the non-0dt upgrades had stopped working with the revert, but when I increase the deploy generation for them, I only get this:

legacy-upgrade-materialized-1     | environmentd: 2024-11-20T14:48:39.298006Z  INFO environmentd::run:environmentd::serve: mz_environmentd::deployment::preflight: Waiting for user to promote this envd to leader

But for the non-0dt upgrades we normally don't manually promote. Will we have to change that in our tests so that they all promote, even for non-0dt upgrades? Will customers also have to do the same thing if they upgrade in self-hosted in a non-0dt manner?

@jkosh44
Copy link
Contributor

jkosh44 commented Nov 20, 2024

But for the non-0dt upgrades we normally don't manually promote. Will we have to change that in our tests so that they all promote, even for non-0dt upgrades?

Yes, we always have to manually promote, even for non-0dt upgrades. I think that tests used the workaround of not incrementing the deploy generation so we don't have to manually promote as a simplification, even though it doesn't actually match prod behavior.

The more I think about this the more I think it's probably not a big deal. Especially since 0dt is enabled in all envs. So up to you if you think it's worth the effort to get this working or if the tests are fine the way they are.

Will customers also have to do the same thing if they upgrade in self-hosted in a non-0dt manner?

I am not up to speed with all the self hosted efforts, but if their k8s cluster behaves like ours (as in they increment the deploy generation for each upgrade), then yes they will have to do the same thing.

@def-
Copy link
Contributor Author

def- commented Nov 20, 2024

Even with the emulator, this is quite surprising to me. You also won't be able to run bin/environmentd on an old version first and then a new one without specifying the deploy_generation manually?

@jkosh44
Copy link
Contributor

jkosh44 commented Nov 20, 2024

Even with the emulator, this is quite surprising to me. You also won't be able to run bin/environmentd on an old version first and then a new one without specifying the deploy_generation manually?

From my reading of the code, if you don't manually specify the deploy generation, then it will default to 0.

If 0dt is enabled and a user tries to upgrade their env without manually specifying the deploy generation, then I'm skeptical that it would work correctly.

If 0dt is disabled and a user tries to upgrade their env without manually specifying the deploy generation, then it might work? That's what all of our tests do, but it's not what we did in prod as per my understanding. I have definitely been implementing features with the assumption that two processes with the same deploy generation will be running the same version of the binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants