-
Notifications
You must be signed in to change notification settings - Fork 621
Create cloud repo specific job #4205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Create cloud repo specific job #4205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me make sure I've got this right:
- we don't want the repohost as a single point of failure for backup/archiving
- one way to remove that SPOF is to undo the always-on repohost strategy for cloud backups
- but to do that we need to always have a pgbackrest server on the instance pod
- and we need to add the server configs, but not the certs (that's still dedicated repo host only)
- and we add all the configs to a configmap for convenience
- and we alter the job pod command depending on whether this is a PVC or a cloud
Do I have that right and/or did I miss any steps?
"k8s.io/apimachinery/pkg/util/rand" | ||
|
||
"github.com/crunchydata/postgres-operator/pkg/apis/postgres-operator.crunchydata.com/v1beta1" | ||
) | ||
|
||
var tmpDirSizeLimit = resource.MustParse("16Mi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this size also copied over from postgrescluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes... don't know if we ever want the size to vary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we want the size to vary, or if it's worth commenting on "why" we chose this size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the answer, so I added a TODO comment to provide an explanation for the size.
@@ -111,17 +112,9 @@ func AddConfigToInstancePod( | |||
// volumes stay valid and Kubernetes propagates their contents to those pods. | |||
secret := corev1.VolumeProjection{Secret: &corev1.SecretProjection{}} | |||
secret.Secret.Name = naming.PGBackRestSecret(cluster).Name | |||
secret.Secret.Items = append(secret.Secret.Items, clientCertificates()...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to only create the pgbackrest server in the instance if a repo host was present. Now we always create the pgbackrest server in the instance (assuming backups are enabled of course) because it is needed for both the repo host and the cloud backups to talk to it. The server not only needs a server cert and key (which are added in the AddServerToInstancePod
function), but also requires the client CA cert which is provided via the clientCertificates
function... So it's a little confusing because the installation of the necessary pieces for the server
occur in both AddServerToInstancePod
and AddConfigToInstancePod
... It probably could be refactored so that the server
stuff is fully handled in the former function...
@@ -1510,7 +1508,7 @@ func (r *Reconciler) reconcileInstanceCertificates( | |||
root.Certificate, leafCert.Certificate, | |||
leafCert.PrivateKey, instanceCerts) | |||
} | |||
if err == nil { | |||
if err == nil && backupsSpecFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double-checking: why don't we need this if using a local volume? Because these certs have to do with talking between repo/instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure what you are referring to, but we always need the instance certificates now. Both the cloud backup from the backup job and the local volume backup from the repo host need to be able to talk to the pgbackrest server in the instance pod.
No, we always need both the server config and the certs regardless of what kind of repos are defined. The rest of your statements/steps sound correct. |
// TODO(cbandy): pass a FQDN in already. | ||
pgHostFQDN := pgHost + "-0." + | ||
serviceName + "." + serviceNamespace + ".svc." + | ||
naming.KubernetesClusterDomain(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the walkthru and now also double-checking: the postgres instance always has the pgbackrest sidecar, so if OTel is enabled, we always want to rotate those, yeah?
Yes, if backups are enabled at all and OTel Logging is enabled, we want to add log rotate configuration for the pgbackrest logs in the postgres instance pod, and this statement always has been true. However, in the code before we were not checking that backups were explicitly enabled, but were checking for repo host volumes. Up until the recent change that I made, starting in 5.7 we always had a repo host in place if backups were enabled, so checking for repo host was essentially equivalent to checking that backups were enabled. However, now that a repo host is only created if a local volume repo is requested in the spec, backups can be enabled without any repo host volumes, so we must explicitly check for backups being enabled when adding the log rotate config. This is the change I made in the last commit. |
Add/adjust tests for cloud repo backup job changes.
6a7aa82
to
c80957c
Compare
Checklist:
Type of Changes:
What is the current behavior (link to any open issues here)?
Currently, backups to cloud repos are done by exec'ing into the pg instance pod's pgbackrest sidecar and running the pgbackrest backup command there.
What is the new behavior (if this is a feature change)?
Backups to cloud repos will now be done in the backup job itself. If no local volume repo exists, a dedicated repo host will not be created.
Other Information: