DNM: mariadbaccount system accounts #320

zzzeek · 2025-04-22T19:04:31Z

This PR is broken into individual PRs:

notes overall below:

reorganize StatefulSet/JobSpec use to get mysql root password from a cluster query script volume-mounted into containers, rather than embedding Secret / password in environment variables / CR variables which cause job hashes to change. This will allow in-place changes to the name of the secret used for mysql root without kicking off jobs, and ensures the current root password can always be fetched appropriately in order to perform work on the database.
introduce a new class of MariaDBAccount called a "system" MariaDBAccount, indicated by a new enumerated field AccountType on the CR. Such accounts link directly to a Galera instance and have no dependency on a MariaDBDatabase CR.
Ensure MariaDBAccount adds a finalizer to its own secret, and also cleans it out on delete
solidify in-place password change ability by supporting finalizer remove for previous secret, improved messaging and account.sh operation
change Galera CR to refer to a "root" MariaDBAccount object with its own secret, no longer using osp-secret. The secret here will be generated in the usual way using EnsureMariaDBAccount before the Galera cluster is set up.

openshift-ci · 2025-04-22T19:04:37Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-04-22T19:04:38Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zzzeek
Once this PR has been reviewed and has the lgtm label, please assign frenzyfriday for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zzzeek · 2025-04-22T21:50:08Z

/ok-to-test

zzzeek · 2025-04-23T13:57:13Z

/retest-required

zzzeek · 2025-04-23T18:11:48Z

/ok-to-test

In order to facilitate an in-place change to the name of the Secret that is referenced by a Galera instance for the mysql root password, rework the approach used by pods and shell scripts to no longer require the root secret name and/or password be passed by environment variable, instead using a pod-level cluster query to retrieve the current root password. The logic to retrieve this password is encapsulated into a single shell script that is present as a volume mount on running containers. This allows Job objects to be created with hashes that do not link to a specific Secret name, as well as to create StatefulSet objects that don't refer to this name. When the Secret name changes on a Galera instance for an in-place root password change, the hashes / CRs for these objects will remain unchanged. A subsequent change to the mariadb operator will add the ability to change the mysql root password of a Galera cluster using a dual-reference architecture where the "current" root secret will be part of <CR>/Status, while the secret referenced in <CR>/Spec will be the "new" root secret. When these two names differ, that will indicate an in-place password change should take place, as well as allowing the pre-existing root password to be available at the same time as the new one in order to do a root password change. The same architecture will be applied to a new class of "system" MariaDBAccount objects that are for use only by the Galera instance itself and do not have a link to any MariaDBDatabase CR. The Galera CR itself will no longer use osp-secret for the mysql root password nor will the secret be directly referenced from the Galera CR, instead referenced by a "system" MariaDBAccount CR which the Galera operator itself will create.

zzzeek · 2025-04-23T21:36:58Z

/retest

zzzeek · 2025-04-23T23:11:20Z

/retest-required

zzzeek · 2025-04-24T00:25:18Z

/retest

zzzeek · 2025-04-24T16:04:26Z

/retest

this change reorganizes mariadbaccount_controller to move the majority of MariaDBDatabase fetch/wait logic as well as Galera fetch/wait logic to separate methods getMariaDBDatabaseForCreate, getMariaDBDatabaseForDelete, getGaleraForCreateOrDelete. This is to allow more flexibility for upcoming changes. All current functionality should remain identical within this commit.

zzzeek · 2025-04-24T20:57:04Z

/retest

zzzeek · 2025-04-29T19:27:20Z

/retest

in mariadbdatabase_funcs, the EnsureMariaDBAccount function called by external controllers adds a finalizer for that calling controller to the Secret referenced by the MariaDBAccount. This seems a little off, since the Secret is most immediately needed by the MariaDBAccount CR itself, and the controller refers to that MariaDBAccount CR also. It seems more appropriate that MariaDBAccount itself should maintain its own finalizer on that Secret, so this logic is added there. The change here causes the API function EnsureMariaDBAccount to add a finalizer to the secret that is local to the mariadbaccount, rather than the helper passed for the calling controller. Existing "remove finalizer" calls which look for the calling controller's finalizer tag in the secret are maintained however to assist with backwards compatibility. This comes up now because we are seeking to add a new class of system-level MariaDBAccount that is used only by the Galera controller itself, but also that these accounts (really all accounts, but mainly the system ones) will support in-place password changes by updating the name of the Secret to be used, implying the old one is no longer needed once the change takes place; it therefore is most appropriate that MariaDBAccount maintain its own finalizers on these secrets.

introduce a new class of MariaDBAccount called a "system" MariaDBAccount, indicated by a new enumerated field AccountType on the CR. Such accounts link directly to a Galera instance and have no dependency on a MariaDBDatabase CR. The expected targets for "system" accounts will include the Galera/mysql root username and password, as well as a system account used by mariadbbackup for SST. Refactor mariadbaccount_controller to isolate logic used for acquiring MariaDBDatabase and Galera CRs into separate functions, and ensure all MariaDBDatabase logic takes place only for "user" accounts (which would be all current MariaDBAccount CRs). Also correct an oversight where MariaDBAccount would not unconditionally apply a finalizer to its Secret object. This logic now takes place in addition to an unconditional removal of the finalizer when the MariaDBAccount object is deleted. A subsequent change will allow system-level passwords to be changed in place by applying the secret name to two separate fields MariaDBAccount/Spec/Secret and MariaDBAccount/Status/Secret. When these two names differ it will indicate an in-place password change should take place.

The existing mariadb operator in fact already "supports" in-place change of secret, since if you change Spec.Secret to a new secret name, that would imply a new job hash, and the account.sh script running using only GRANT statements would update the password per mariadb. This already works for flipping the TLS flag on and off too. So in this patch, we clean this up and add a test to include: * a new field Status.CurrentSecret, which is used to indicate the previous secret from which the finalizer should be removed. This will also be used when we migrate the root password to use MariaDBAccount by providing the "current" root password when changing to a new root password * improved messaging in log messages, name of job. This changes the job hash for mariadbaccount which will incur a run on existing environments, however the job hashes are already changing on existing environments due to the change in how the mysql root password is sent, i.e. via volume mounted script rather than env var secret * update account.sh to use modern idiomatic patterns for user create /alter, while mariadb is fine with the legacy style of using only GRANT statements, MySQL 8 no longer allows this statement to proceed without a CREATE USER, so formalize the commands used here to use distinct CREATE USER, ALTER USER, GRANT statements and clarify the script is good for all create/update user operations.

zzzeek · 2025-05-28T19:16:10Z

/retest

zzzeek · 2025-05-29T00:55:11Z

not related to the immediate failures but im noticing that deep in the prow scripts and probably elsewhere, osp-secret seems to be referenced for mariadb root password outside of the galera operator (that 12345678 password is no longer used as long as this change is kept generating a new root PW each time):

 rm -Rf /alabama/install_yamls/out/operator/nova-operator /alabama/install_yamls/out/mariadb-kuttl-tests/nova/cr
oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "show databases like 'nova_%';" | xargs -I '{}' oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "flush tables; drop database if exists {};"
Error from server (NotFound): pods "openstack-galera-0" not found
oc kustomize /alabama/install_yamls/out/mariadb-kuttl-tests/infra-redis/cr | oc delete --ignore-not-found=true -f -
error: must build at directory: not a valid directory: evalsymlink failure on '/alabama/install_yamls/out/mariadb-kuttl-tests/infra-redis/cr' : lstat /alabama/install_yamls/out/mariadb-kuttl-tests/infra-redis: no such file or directory
No resources found

~~not totally sure how to fix this issue because this PR changes where the root password comes from.~~ see next comment for how IMO this should be fixed.

zzzeek · 2025-05-29T01:04:17Z

Well, what we can do is merge 9fff6b0 ahead of everything else, then change all scripts everywhere to use the root_auth.sh script

so

oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "show databases like 'nova_%';" | xargs -I '{}' oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "flush tables; drop database if exists {};"

becomes

oc rsh openstack-galera-0 <<EOF
#!/bin/bash
source /var/lib/operator-scripts/root_auth.sh

mysql -u root  -ss -e "show databases like 'nova_%';" | xargs -I '{}' mysql -u root  -ss -e "flush tables; drop database if exists {};"
EOF

that change can be made just with 9fff6b0 merged alone as the controller that creates new galera pods, then all the scripts everywhere else can be abstracted away from where the galera root password comes from

zzzeek · 2025-05-29T12:57:04Z

seeing as mariadb-operator-build-deploy-kuttl is now what's failing, I probably need to restore copying osp-secret as the initial root password just to get the code out the door as there are likely other parts of build / CI / deployment that are relying on this structure

zzzeek · 2025-05-30T20:51:41Z

controllers/galera_controller.go

+
+	// ******** TEMPORARY ************
+	// Pull the DbRootPassword from osp-secret to allow CI / external
+	// scripts to work, until they can adapt to root_auth.sh


some view into what scripts need modification can be seen using this query:

https://github.com/search?q=org%3Aopenstack-k8s-operators+DbRootPassword&type=code&p=1

This commit ties together the previous ones to create a new MariaDBAccount when a Galera instance is created, and then to use the password from that account/secret in the mariadb bootstrap/maintenance scripts. Galera gets bootstrapped with this secret, then the mariadbaccount controller, who is waiting for galera to be available to set up this new "root" account, wakes up when galera is running, and changes the root password to itself, establishing the initial job hash for the mariadbaccount. As we now have a mariadbaccount linked to the outermost lifecycle of a galera instance, some hardening of the deletion process has been added to clarify that mariadbaccount will run deletion jobs only if Galera is not marked for deletion. If galera is marked for deletion, then we have to assume the service/pods are gone and no more drops can take place, even if the Galera CR is still present (chainsaw conveniently adds its own finalizer to Galera when running, preventing it from being fully deleted, which exposed this issue).

dciabrin · 2025-07-01T08:24:48Z

I needed to push an update to the galera CRD not so long ago, so this needs to be rebased...

I wonder if we could take that opportunity to maybe split those commits into separate PR (as suggested by @zzzeek to help move this forward), so that we can do a simpler review before merging?

I had a chat with @zzzeek already to discuss the details behind the "system" MariaDBAccount approach, and I think it's solid and that's where we want to go. We now need to carefully review those changes before merging.

openshift-merge-robot · 2025-07-01T08:24:56Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

zzzeek · 2025-07-01T12:49:32Z

OK do you want all six ? if that makes it easier

dciabrin · 2025-07-01T12:56:47Z

OK do you want all six ? if that makes it easier

That'd be helpful for me at least :)

zzzeek · 2025-07-01T12:58:33Z

six it is

openshift-ci · 2025-07-18T10:06:17Z

@zzzeek: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/mariadb-operator-build-deploy-chainsaw	`83301b5`	link	true	`/test mariadb-operator-build-deploy-chainsaw`
ci/prow/precommit-check	`83301b5`	link	true	`/test precommit-check`
ci/prow/mariadb-operator-build-deploy-kuttl	`83301b5`	link	true	`/test mariadb-operator-build-deploy-kuttl`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress label Apr 22, 2025

openshift-ci bot added the ok-to-test label Apr 22, 2025

zzzeek force-pushed the OSPRH-14916 branch from e25db52 to 13fcba6 Compare April 23, 2025 15:43

zzzeek removed the ok-to-test label Apr 23, 2025

zzzeek force-pushed the OSPRH-14916 branch 2 times, most recently from 517676e to 8744b52 Compare April 23, 2025 18:11

openshift-ci bot added the ok-to-test label Apr 23, 2025

zzzeek force-pushed the OSPRH-14916 branch from 8744b52 to 4b56102 Compare April 23, 2025 19:32

zzzeek force-pushed the OSPRH-14916 branch from 4b56102 to a893748 Compare April 23, 2025 20:30

zzzeek changed the title ~~DNM: mariadbaccount system accounts~~ mariadbaccount system accounts Apr 23, 2025

zzzeek force-pushed the OSPRH-14916 branch 2 times, most recently from a73c8d4 to 94f22a7 Compare April 24, 2025 00:24

zzzeek force-pushed the OSPRH-14916 branch from 94f22a7 to 3e19c88 Compare April 24, 2025 16:02

zzzeek force-pushed the OSPRH-14916 branch from 3e19c88 to a9c9612 Compare April 24, 2025 17:24

zzzeek requested review from abays, dprince and gibizer May 7, 2025 14:25

zzzeek force-pushed the OSPRH-14916 branch from 9b750db to 68dd969 Compare May 28, 2025 17:52

zzzeek added 3 commits May 28, 2025 13:55

zzzeek force-pushed the OSPRH-14916 branch 3 times, most recently from 9c6111e to e6c102c Compare May 28, 2025 18:42

zzzeek force-pushed the OSPRH-14916 branch from e6c102c to 23afa0a Compare May 29, 2025 00:55

zzzeek force-pushed the OSPRH-14916 branch 5 times, most recently from 7fcb95e to b7710bc Compare May 30, 2025 19:53

zzzeek commented May 30, 2025

View reviewed changes

zzzeek force-pushed the OSPRH-14916 branch from b7710bc to 83301b5 Compare May 31, 2025 03:36

openshift-merge-robot added the needs-rebase label Jul 1, 2025

zzzeek changed the title ~~mariadbaccount system accounts~~ DNM: mariadbaccount system accounts Jul 1, 2025

zzzeek marked this pull request as draft July 1, 2025 13:11

openshift-ci bot added the do-not-merge/work-in-progress label Jul 1, 2025

DNM: mariadbaccount system accounts #320

Are you sure you want to change the base?

DNM: mariadbaccount system accounts #320

Uh oh!

Conversation

zzzeek commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Apr 22, 2025

Uh oh!

openshift-ci bot commented Apr 22, 2025

Uh oh!

zzzeek commented Apr 22, 2025

Uh oh!

zzzeek commented Apr 23, 2025

Uh oh!

zzzeek commented Apr 23, 2025

Uh oh!

zzzeek commented Apr 23, 2025

Uh oh!

zzzeek commented Apr 23, 2025

Uh oh!

zzzeek commented Apr 24, 2025

Uh oh!

zzzeek commented Apr 24, 2025

Uh oh!

zzzeek commented Apr 24, 2025

Uh oh!

zzzeek commented Apr 29, 2025

Uh oh!

zzzeek commented May 28, 2025

Uh oh!

zzzeek commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zzzeek commented May 29, 2025

Uh oh!

zzzeek commented May 29, 2025

Uh oh!

zzzeek May 30, 2025

Choose a reason for hiding this comment

Uh oh!

dciabrin commented Jul 1, 2025

Uh oh!

openshift-merge-robot commented Jul 1, 2025

Uh oh!

zzzeek commented Jul 1, 2025

Uh oh!

dciabrin commented Jul 1, 2025

Uh oh!

zzzeek commented Jul 1, 2025

Uh oh!

openshift-ci bot commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zzzeek commented Apr 22, 2025 •

edited

Loading

zzzeek commented May 29, 2025 •

edited

Loading