diff --git a/.github/workflows/README.md b/.github/workflows/README.md index c1c5001a35..f09bc45b9e 100644 --- a/.github/workflows/README.md +++ b/.github/workflows/README.md @@ -13,33 +13,18 @@ While we're using GCP Composer, "deployment" of Airflow consists of two parts: This workflow builds a static website from the Svelte app and deploys it to Netlify. +## deploy-kubernetes.yml + +This workflow deploys changes to the production Kubernetes cluster when they get merged into the `main` branch. + ## build-\*.yml workflows Workflows prefixed with `build-` generally lint, test, and (usually) publish either a Python package or a Docker image. -## service-\*.yml workflows +## preview-\*.yml workflows -Workflows prefixed with `service-` deal with Kubernetes deployments. +Workflows prefixed with `preview-` deal with generating previews for pull request changes -- `service-release-candidate.yml` creates candidate branches, using [hologit](https://github.com/JarvusInnovations/hologit) to bring in external Helm charts and remove irrelevant (i.e. non-infra) code -- `service-release-diff.yml` renders kubectl diffs on PRs targeting release branches -- `service-release-channel.yml` deploys to a given channel (i.e. environment) on updates to a release branch +- `preview-kubernetes.yml` renders kubectl diffs on PRs changing cluster content Some of these workflows use hologit or invoke. See the READMEs in [.holo](../../.holo) and [ci](../../ci) for documentation regarding hologit and invoke, respectively. - -## GitOps - -The workflows described above also define their triggers. In general, developer workflows should follow these steps. - -1. Check out a feature branch -2. Put up a PR for that feature branch, targeting `main` - - `service-release-candidate` will run and create a remote branch named `candidate/` to `releases/` (such as `releases/test`) will only show changes/content -relevant to infra in addition to `releases/*` branches only ever containing infra code. For example: +In this repository, we declare one holobranch named [kubernetes-workspace](../branches/kubernetes-workspace). +By projecting this holobranch in GitHub Actions, a tree containing only the code relevant to infra/Kubernetes +as well as Kubernetes code from the upstream [cluster-template](https://github.com/JarvusInnovations/cluster-template) +repository is generated. -1. Create a [PR making an infra-related change](https://github.com/cal-itp/data-infra/pull/2828) -2. Create and merge a [PR to deploy a candidate branch to test](https://github.com/cal-itp/data-infra/pull/2829) -3. Merge the PR from #1 -4. After merge, [PR to deploy the main candidate branch to prod](https://github.com/cal-itp/data-infra/pull/2832) +See [`ci/README.md`](../ci/README.md) for details on the pull request workflow for previewing and deploying Kubernetes changes. diff --git a/.holo/branches/release-candidate/_data-infra.toml b/.holo/branches/kubernetes-workspace/_data-infra.toml similarity index 69% rename from .holo/branches/release-candidate/_data-infra.toml rename to .holo/branches/kubernetes-workspace/_data-infra.toml index a888f90082..7a29e7ad4e 100644 --- a/.holo/branches/release-candidate/_data-infra.toml +++ b/.holo/branches/kubernetes-workspace/_data-infra.toml @@ -1,3 +1,3 @@ [holomapping] -files = [ "ci/**", "kubernetes/apps/**", "kubernetes/system/**", ".github/workflows/service-*" ] +files = [ "ci/**", "kubernetes/apps/**", "kubernetes/system/**", ".github/workflows/*-kubernetes.yml" ] before = "*" diff --git a/.holo/branches/release-candidate/kubernetes/apps/charts/grafana.toml b/.holo/branches/kubernetes-workspace/kubernetes/apps/charts/grafana.toml similarity index 100% rename from .holo/branches/release-candidate/kubernetes/apps/charts/grafana.toml rename to .holo/branches/kubernetes-workspace/kubernetes/apps/charts/grafana.toml diff --git a/.holo/branches/release-candidate/kubernetes/apps/charts/loki.toml b/.holo/branches/kubernetes-workspace/kubernetes/apps/charts/loki.toml similarity index 100% rename from .holo/branches/release-candidate/kubernetes/apps/charts/loki.toml rename to .holo/branches/kubernetes-workspace/kubernetes/apps/charts/loki.toml diff --git a/.holo/branches/release-candidate/kubernetes/apps/charts/prometheus.toml b/.holo/branches/kubernetes-workspace/kubernetes/apps/charts/prometheus.toml similarity index 100% rename from .holo/branches/release-candidate/kubernetes/apps/charts/prometheus.toml rename to .holo/branches/kubernetes-workspace/kubernetes/apps/charts/prometheus.toml diff --git a/.holo/branches/release-candidate/kubernetes/apps/charts/promtail.toml b/.holo/branches/kubernetes-workspace/kubernetes/apps/charts/promtail.toml similarity index 100% rename from .holo/branches/release-candidate/kubernetes/apps/charts/promtail.toml rename to .holo/branches/kubernetes-workspace/kubernetes/apps/charts/promtail.toml diff --git a/ci/channels/prod.yaml b/ci/channels/prod.yaml index 30c0f70522..faf422769f 100644 --- a/ci/channels/prod.yaml +++ b/ci/channels/prod.yaml @@ -26,6 +26,8 @@ calitp: namespace: jupyterhub helm_name: jupyterhub helm_chart: kubernetes/apps/charts/jupyterhub + secret_helm_values: + - jupyterhub_jupyterhub-sensitive-helm-values secrets: - jupyterhub_jupyterhub-gcloud-service-key - jupyterhub_jupyterhub-github-config diff --git a/ci/tasks.py b/ci/tasks.py index 05dd1bb357..4a4badb343 100644 --- a/ci/tasks.py +++ b/ci/tasks.py @@ -37,6 +37,7 @@ class Release(BaseModel): helm_name: Optional[str] helm_chart: Optional[Path] helm_values: List[Path] = [] + secret_helm_values: List[str] = [] timeout: Optional[str] # for kustomize @@ -168,6 +169,8 @@ def diff( full_diff = "" result: Result + secrets_client = secretmanager.SecretManagerServiceClient() + for release in get_releases(c, driver=actual_driver, app=app): if release.driver == ReleaseDriver.kustomize: assert release.kustomize_dir is not None @@ -178,28 +181,49 @@ def diff( chart_path = c.calitp_config.git_root / Path(release.helm_chart) c.run(f"helm dependency update {chart_path}") - values_str = " ".join( - [ - f"--values {c.calitp_config.git_root / Path(values_file)}" - for values_file in release.helm_values - ] - ) - assert release.helm_name is not None - result = c.run( - " ".join( + with tempfile.TemporaryDirectory() as tmpdir: + secret_helm_value_paths = [] + for secret_helm_values in release.secret_helm_values: + secret_path = Path(tmpdir) / Path(f"{secret_helm_values}.yaml") + name = f"projects/1005246706141/secrets/{secret_helm_values}/versions/latest" + secret_contents = secrets_client.access_secret_version( + request={"name": name} + ).payload.data.decode("UTF-8") + + with open(secret_path, "w") as f: + f.write(secret_contents) + + secret_helm_value_paths.append(secret_path) + print(f"Downloaded secret helm values: {secret_path}", flush=True) + + values_str = " ".join( [ - "helm", - "diff", - "upgrade", - release.helm_name, - str(chart_path), - f"--namespace={release.namespace}", - values_str, - "-C 5", # only include 5 lines of context + f"--values={c.calitp_config.git_root / Path(values_file)}" + for values_file in release.helm_values ] - ), - warn=True, - ) + ).join( + [ + f"--values={secret_path}" + for secret_path in secret_helm_value_paths + ] + ) + assert release.helm_name is not None + result = c.run( + " ".join( + [ + "helm", + "diff", + "upgrade", + release.helm_name, + str(chart_path), + f"--namespace={release.namespace}", + values_str, + "-C 5", # only include 5 lines of context + "--no-hooks", # exclude hooks that get recreated every upgrade from diff + ] + ), + warn=True, + ) else: print(f"Encountered unknown driver: {release.driver}", flush=True) raise RuntimeError @@ -207,11 +231,18 @@ def diff( if result.stdout: full_diff += result.stdout - msg = ( - f"```{full_diff}```" - if full_diff - else f"No {driver if driver else 'manifest'} changes found for {c.calitp_config.channel}.\n" + msg = "\n\n".join( + [ + "The following changes will be applied to the production Kubernetes cluster upon merge.", + "**BE AWARE** this may not reveal changes that have been manually applied to the cluster getting undone—applying manual changes to the cluster should be avoided.", + ( + f"```diff\n{full_diff}```\n" + if full_diff + else f"No {driver if driver else 'manifest'} changes found for {c.calitp_config.channel}.\n" + ), + ] ) + if outfile: print(f"writing {len(msg)=} to {outfile}", flush=True) with open(outfile, "w") as f: diff --git a/images/dask/README.md b/images/dask/README.md index bb2e8cb01b..dc06699e0b 100644 --- a/images/dask/README.md +++ b/images/dask/README.md @@ -22,4 +22,4 @@ docker build -t ghcr.io/cal-itp/data-infra/dask:[NEW VERSION TAG] . docker push ghcr.io/cal-itp/data-infra/dask:[NEW VERSION TAG] ``` -After deploying, you will likely need to change references to the version of the image in use by Kubernetes-managed services, such as [here](../../kubernetes/apps/charts/dask/values.yaml). See [our GitHub workflows documentation](../../.github/workflows#service-yml-workflows) for how to manage deployment of updated Kubernetes services and their associated workloads. +After deploying, you will likely need to change references to the version of the image in use by Kubernetes-managed services, such as [here](../../kubernetes/apps/charts/dask/values.yaml). See [our GitHub workflows documentation](../../kubernetes/README.md#gitops) for how to manage deployment of updated Kubernetes services and their associated workloads. diff --git a/images/jupyter-singleuser/README.md b/images/jupyter-singleuser/README.md index ac0f0dfe8d..bca477a588 100644 --- a/images/jupyter-singleuser/README.md +++ b/images/jupyter-singleuser/README.md @@ -20,4 +20,4 @@ docker build -t ghcr.io/cal-itp/data-infra/jupyter-singleuser:[NEW VERSION TAG] docker push ghcr.io/cal-itp/data-infra/jupyter-singleuser:[NEW VERSION TAG] ``` -After deploying, you will likely need to change references to the version of the image in use by Kubernetes-managed services, such as [here](../../kubernetes/apps/charts/jupyterhub/values.yaml). See [our GitHub workflows documentation](../../.github/workflows#service-yml-workflows) for how to manage deployment of updated Kubernetes services and their associated workloads. +After deploying, you will likely need to change references to the version of the image in use by Kubernetes-managed services, such as [here](../../kubernetes/apps/charts/jupyterhub/values.yaml). See [our GitHub workflows documentation](../../kubernetes/README.md#gitops) for how to manage deployment of updated Kubernetes services and their associated workloads. diff --git a/kubernetes/README.md b/kubernetes/README.md index 0e8bf923b1..e4c856be89 100644 --- a/kubernetes/README.md +++ b/kubernetes/README.md @@ -8,6 +8,19 @@ We deploy our applications and services to a Google Kubernetes Engine cluster. I A [glossary](#Glossary) exists at the end of this document. +## GitOps + +The workflows described above also define their triggers. In general, developer workflows should follow these steps. + +1. Check out a feature branch +2. Put up a PR for that feature branch, targeting `main` + - `preview-kubernetes` will run and add a comment showing the diff of changes that will affect the production Kubernetes cluster + + **BE AWARE**: This diff may *NOT* reveal any changes that have been manually applied to the cluster being undone. The `helm diff` plugin used under the hood compares the new manifests against the saved snapshot of the last ones Helm deployed rather than the current state of the cluster. It has to work that way because that most accurately reflects how helm will apply the changes. This is why it is important to avoid making manual changes to the cluster. + +3. Merge the PR + - `deploy-kubernetes` will run and deploy to `prod` this time + ## Cluster Administration We do not currently use Terraform to manage our cluster, nodepools, etc. and major changes to the cluster are unlikely to be necessary, but we do have some bash scripts that can help with tasks such as creating new node pools or creating a test cluster. @@ -128,7 +141,7 @@ At the time of this writing, a JupyterHub deployment is available at [https://no - `ingress.hosts` - `ingress.tls.hosts` -# Backups +## Backups For most of our backups we utilize [Restic](https://restic.readthedocs.io/en/latest/010_introduction.html); this section uses the Metabase database backup as an example. diff --git a/kubernetes/apps/charts/sentry/values.yaml b/kubernetes/apps/charts/sentry/values.yaml index f7016ea343..c9fd05bbf8 100644 --- a/kubernetes/apps/charts/sentry/values.yaml +++ b/kubernetes/apps/charts/sentry/values.yaml @@ -81,6 +81,7 @@ sentry: days: 30 postProcessForwardTransactions: replicas: 2 + existingSecret: sentry-sentry-secret snuba: subscriptionConsumerTransactions: @@ -108,3 +109,6 @@ sentry: persistentVolumeClaim: dataPersistentVolume: storage: "50Gi" + + hooks: + activeDeadlineSeconds: 1200