Skip to content

Conversation

@zzzeek
Copy link
Contributor

@zzzeek zzzeek commented Apr 22, 2025

This PR is broken into individual PRs:

#342
#343
#344
#345
#346
#347

notes overall below:

this is for OSPRH-14916

  • reorganize StatefulSet/JobSpec use to get mysql root password from a cluster query script volume-mounted into containers, rather than embedding Secret / password in environment variables / CR variables which cause job hashes to change. This will allow in-place changes to the name of the secret used for mysql root without kicking off jobs, and ensures the current root password can always be fetched appropriately in order to perform work on the database.
  • introduce a new class of MariaDBAccount called a "system" MariaDBAccount, indicated by a new enumerated field AccountType on the CR. Such accounts link directly to a Galera instance and have no dependency on a MariaDBDatabase CR.
  • Ensure MariaDBAccount adds a finalizer to its own secret, and also cleans it out on delete
  • solidify in-place password change ability by supporting finalizer remove for previous secret, improved messaging and account.sh operation
  • change Galera CR to refer to a "root" MariaDBAccount object with its own secret, no longer using osp-secret. The secret here will be generated in the usual way using EnsureMariaDBAccount before the Galera cluster is set up.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 22, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zzzeek
Once this PR has been reviewed and has the lgtm label, please assign frenzyfriday for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 22, 2025

/ok-to-test

@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 23, 2025

/retest-required

@zzzeek zzzeek removed the ok-to-test label Apr 23, 2025
@zzzeek zzzeek force-pushed the OSPRH-14916 branch 2 times, most recently from 517676e to 8744b52 Compare April 23, 2025 18:11
@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 23, 2025

/ok-to-test

In order to facilitate an in-place change to the name of the
Secret that is referenced by a Galera instance for the
mysql root password, rework
the approach used by pods and shell scripts to no longer
require the root secret name and/or password be passed by
environment variable, instead using a pod-level cluster
query to retrieve the current root password.  The logic
to retrieve this password is encapsulated into a single
shell script that is present as a volume mount on running containers.

This allows Job objects to be created with hashes that
do not link to a specific Secret name, as well as to
create StatefulSet objects that don't refer to this name.
When the Secret name changes on a Galera instance for
an in-place root password change, the hashes / CRs for
these objects will remain unchanged.

A subsequent change to the mariadb operator will add the ability
to change the mysql root password of a Galera cluster using a
dual-reference architecture where
the "current" root secret will be part of <CR>/Status, while
the secret referenced in <CR>/Spec will be the "new" root
secret.  When these two names differ, that will indicate an
in-place password change should take place, as well
as allowing the pre-existing root password to be available
at the same time as the new one in order to do a root password
change.   The same
architecture will be applied to a new class of "system" MariaDBAccount
objects that are for use only by the Galera instance itself
and do not have a link to any MariaDBDatabase CR.  The
Galera CR itself will no longer use osp-secret
for the mysql root password nor will the secret be directly
referenced from the Galera CR, instead referenced by a
"system" MariaDBAccount CR which the Galera operator itself
will create.
@zzzeek zzzeek changed the title DNM: mariadbaccount system accounts mariadbaccount system accounts Apr 23, 2025
@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 23, 2025

/retest

@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 23, 2025

/retest-required

@zzzeek zzzeek force-pushed the OSPRH-14916 branch 2 times, most recently from a73c8d4 to 94f22a7 Compare April 24, 2025 00:24
@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 24, 2025

/retest

@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 24, 2025

/retest

this change reorganizes mariadbaccount_controller to move the majority
of MariaDBDatabase fetch/wait logic as well as Galera fetch/wait
logic to separate methods getMariaDBDatabaseForCreate,
getMariaDBDatabaseForDelete, getGaleraForCreateOrDelete.  This is to
allow more flexibility for upcoming changes.   All current functionality
should remain identical within this commit.
@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 24, 2025

/retest

1 similar comment
@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 29, 2025

/retest

@zzzeek zzzeek requested review from abays, dprince and gibizer May 7, 2025 14:25
zzzeek added 3 commits May 28, 2025 13:55
in mariadbdatabase_funcs, the EnsureMariaDBAccount function
called by external controllers adds a finalizer for that calling
controller to the Secret referenced by the MariaDBAccount.  This
seems a little off, since the Secret is most immediately needed
by the MariaDBAccount CR itself, and the controller refers to that
MariaDBAccount CR also.   It seems more appropriate that
MariaDBAccount itself should maintain its own finalizer on that
Secret, so this logic is added there.

The change here causes the API function EnsureMariaDBAccount
to add a finalizer to the secret that is local to the mariadbaccount,
rather than the helper passed for the calling controller.
Existing "remove finalizer" calls which look for the calling
controller's finalizer tag in the secret are maintained however
to assist with backwards compatibility.

This comes up now because we are seeking to add a new class of
system-level MariaDBAccount that
is used only by the Galera controller itself, but also that these
accounts (really all accounts, but mainly the system ones) will
support in-place password changes by updating the name of the Secret
to be used, implying the old one is no longer needed once the change
takes place; it therefore is most appropriate that MariaDBAccount
maintain its own finalizers on these secrets.
introduce a new class of MariaDBAccount called a "system"
MariaDBAccount, indicated by a new enumerated field AccountType
on the CR.  Such accounts link directly to a Galera instance
and have no dependency on a MariaDBDatabase CR.

The expected targets for "system" accounts will include the
Galera/mysql root username and password, as well as a system
account used by mariadbbackup for SST.

Refactor mariadbaccount_controller to isolate logic used for
acquiring MariaDBDatabase and Galera CRs into separate functions,
and ensure all MariaDBDatabase logic takes place only for "user"
accounts (which would be all current MariaDBAccount CRs).

Also correct an oversight where MariaDBAccount would not unconditionally
apply a finalizer to its Secret object.   This logic now takes place
in addition to an unconditional removal of the finalizer when the
MariaDBAccount object is deleted.

A subsequent change will allow system-level passwords to be changed
in place by applying the secret name to two separate fields
MariaDBAccount/Spec/Secret and MariaDBAccount/Status/Secret.  When
these two names differ it will indicate an in-place password change
should take place.
The existing mariadb operator in fact already "supports" in-place
change of secret, since if you change Spec.Secret to a new secret
name, that would imply a new job hash, and the account.sh script running
using only GRANT statements would update the password per mariadb.
This already works for flipping the TLS flag on and off too.

So in this patch, we clean this up and add a test to include:

* a new field Status.CurrentSecret, which is used to indicate the
  previous secret from which the finalizer should be removed.  This will
  also be used when we migrate the root password to use MariaDBAccount
  by providing the "current" root password when changing to a new root
  password
* improved messaging in log messages, name of job.   This changes the
  job hash for mariadbaccount which will incur a run on existing environments,
  however the
  job hashes are already changing on existing environments due to the
  change in how the mysql root password is sent, i.e. via volume mounted
  script rather than env var secret
* update account.sh to use modern idiomatic patterns for user create
  /alter, while mariadb is fine with the legacy style of using only
  GRANT statements, MySQL 8 no longer allows this statement to proceed
  without a CREATE USER, so formalize the commands used here to use
  distinct CREATE USER, ALTER USER, GRANT statements and clarify the
  script is good for all create/update user operations.
@zzzeek zzzeek force-pushed the OSPRH-14916 branch 3 times, most recently from 9c6111e to e6c102c Compare May 28, 2025 18:42
@zzzeek
Copy link
Contributor Author

zzzeek commented May 28, 2025

/retest

@zzzeek
Copy link
Contributor Author

zzzeek commented May 29, 2025

not related to the immediate failures but im noticing that deep in the prow scripts and probably elsewhere, osp-secret seems to be referenced for mariadb root password outside of the galera operator (that 12345678 password is no longer used as long as this change is kept generating a new root PW each time):

 rm -Rf /alabama/install_yamls/out/operator/nova-operator /alabama/install_yamls/out/mariadb-kuttl-tests/nova/cr
oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "show databases like 'nova_%';" | xargs -I '{}' oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "flush tables; drop database if exists {};"
Error from server (NotFound): pods "openstack-galera-0" not found
oc kustomize /alabama/install_yamls/out/mariadb-kuttl-tests/infra-redis/cr | oc delete --ignore-not-found=true -f -
error: must build at directory: not a valid directory: evalsymlink failure on '/alabama/install_yamls/out/mariadb-kuttl-tests/infra-redis/cr' : lstat /alabama/install_yamls/out/mariadb-kuttl-tests/infra-redis: no such file or directory
No resources found 

not totally sure how to fix this issue because this PR changes where the root password comes from. see next comment for how IMO this should be fixed.

@zzzeek
Copy link
Contributor Author

zzzeek commented May 29, 2025

Well, what we can do is merge 9fff6b0 ahead of everything else, then change all scripts everywhere to use the root_auth.sh script

so

oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "show databases like 'nova_%';" | xargs -I '{}' oc rsh openstack-galera-0 mysql -u root --password=12345678 -ss -e "flush tables; drop database if exists {};"

becomes

oc rsh openstack-galera-0 <<EOF
#!/bin/bash
source /var/lib/operator-scripts/root_auth.sh

mysql -u root  -ss -e "show databases like 'nova_%';" | xargs -I '{}' mysql -u root  -ss -e "flush tables; drop database if exists {};"
EOF

that change can be made just with 9fff6b0 merged alone as the controller that creates new galera pods, then all the scripts everywhere else can be abstracted away from where the galera root password comes from

@zzzeek
Copy link
Contributor Author

zzzeek commented May 29, 2025

seeing as mariadb-operator-build-deploy-kuttl is now what's failing, I probably need to restore copying osp-secret as the initial root password just to get the code out the door as there are likely other parts of build / CI / deployment that are relying on this structure

@zzzeek zzzeek force-pushed the OSPRH-14916 branch 5 times, most recently from 7fcb95e to b7710bc Compare May 30, 2025 19:53

// ******** TEMPORARY ************
// Pull the DbRootPassword from osp-secret to allow CI / external
// scripts to work, until they can adapt to root_auth.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some view into what scripts need modification can be seen using this query:

https://github.com/search?q=org%3Aopenstack-k8s-operators+DbRootPassword&type=code&p=1

This commit ties together the previous ones to create a new
MariaDBAccount when a Galera instance is created, and then to
use the password from that account/secret in the mariadb
bootstrap/maintenance scripts.

Galera gets bootstrapped with this secret, then the mariadbaccount
controller, who is waiting for galera to be available to set up this
new "root" account, wakes up when galera is running, and changes
the root password to itself, establishing the initial job hash
for the mariadbaccount.

As we now have a mariadbaccount linked to the outermost lifecycle
of a galera instance, some hardening of the deletion process has been
added to clarify that mariadbaccount will run deletion jobs only if
Galera is not marked for deletion.  If galera is marked for deletion,
then we have to assume the service/pods are gone and no more drops can
take place, even if the Galera CR is still present (chainsaw
conveniently adds its own finalizer to Galera when running, preventing
it from being fully deleted, which exposed this issue).
@dciabrin
Copy link
Contributor

dciabrin commented Jul 1, 2025

I needed to push an update to the galera CRD not so long ago, so this needs to be rebased...

I wonder if we could take that opportunity to maybe split those commits into separate PR (as suggested by @zzzeek to help move this forward), so that we can do a simpler review before merging?

I had a chat with @zzzeek already to discuss the details behind the "system" MariaDBAccount approach, and I think it's solid and that's where we want to go. We now need to carefully review those changes before merging.

@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 1, 2025

OK do you want all six ? if that makes it easier

@dciabrin
Copy link
Contributor

dciabrin commented Jul 1, 2025

OK do you want all six ? if that makes it easier

That'd be helpful for me at least :)

@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 1, 2025

six it is

@zzzeek zzzeek changed the title mariadbaccount system accounts DNM: mariadbaccount system accounts Jul 1, 2025
@zzzeek zzzeek marked this pull request as draft July 1, 2025 13:11
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 18, 2025

@zzzeek: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/mariadb-operator-build-deploy-chainsaw 83301b5 link true /test mariadb-operator-build-deploy-chainsaw
ci/prow/precommit-check 83301b5 link true /test precommit-check
ci/prow/mariadb-operator-build-deploy-kuttl 83301b5 link true /test mariadb-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants