This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

release checklist #1164

Open

knrt10 wants to merge 3 commits into master from knrt10/release-checklist

Member

knrt10 commented Nov 5, 2020

This PR creates a new folder release-process which contains all the required docs for releasing. This also adds manual checklist, which we need to follow before making a release.

closes: #999

knrt10 requested review from iaguis, surajssd, ipochi and invidian

November 5, 2020 08:52

Member Author

knrt10 commented Nov 5, 2020

I have tagged everyone who has done the release. Please add the things that I have missed, that you might have tested during the release process.

knrt10 force-pushed the knrt10/release-checklist branch 2 times, most recently from 5519dca to 6c3d592 Compare

November 5, 2020 08:57

invidian suggested changes

View reviewed changes

Member

invidian left a comment

Some suggestions

docs/installer/lokoctl.md Outdated Show resolved Hide resolved

docs/release-process/RELEASING.md Outdated Show resolved Hide resolved

docs/release-process/CHECKLIST.md Outdated

Comment on lines 8 to 13

+              - Checkout to old release tag
+                - e.g. `git checkout v0.1.0`
+              - Build `lokoctl` binary from the last release
+                - e.g. `make build`
+                - Copy `lokoctl` binary to your assets directory.

Member

invidian Nov 5, 2020

Maybe use release binary instead?

docs/release-process/CHECKLIST.md Outdated

Comment on lines 38 to 84

+              This sections checks if components work as desired.
+              - Check all certificates are valid
+                - e.g. `kubectl get certificates -A`
+                - Certificates for all your components are valid.
+              - Check external IP is assigned to contour service. This will verify that MetalLB is assigning IP to service of type `LoadBalancer`.
+                - `kubectl get svc -n projectcontour`
+              - Check routes are added to AWS for your components. If you have used route53 DNS provider, you can check them [here](https://console.aws.amazon.com/route53/v2/home#Dashboard). Make sure to check the correct hosted zone.
+              - Check Gangway Ingress Host URL that you have configured works fine.
+              - Check httpbin Ingress Host URL that you have configured works fine.
+              - Do some **blackbox testing** by sending HTTP requests through MetalLB + Contour + cert-manager.
+              - Check metrics for your cluster by going to Prometheus Ingress Host URL.
+              - Check velero component works fine, by testing it for a namespace.
+                - Run the following commands:
+                ```sh
+                # Create test namespace.
+                kubectl create ns test
+                # Create a serviceaccount.
+                kubectl create sa test
+                # Create velero backup.
+                velero backup create serviceaccount-backup --include-namespaces test
+                # Delete namespace test.
+                kubectl delete ns test
+                # Restore namespace using velero.
+                velero restore create --from-backup serviceaccount-backup
+                # Check serviceaccount test exist.
+                kubectl get sa test
+                ```
+              - Check web-ui Ingress Host URL that you have configured works fine.
+              **IMPORTANT**: Follow the whole process again with multi-cluster (controller node).
+              If everything works fine, continue with the release process.

Member

invidian Nov 5, 2020

Hm, perhaps we could run our e2e test suite to cover all that.

Member Author

knrt10 Nov 5, 2020 •

edited

Loading

This can be added as an extra step too. For making sure everything works fine. What do you think?

Member

invidian Nov 6, 2020

Okay, but I think we should prefer automated tests rather than doing this by hand.

surajssd suggested changes

View reviewed changes

docs/release-process/CHECKLIST.md Outdated Show resolved Hide resolved

docs/release-process/CHECKLIST.md Outdated

+                - Copy `lokoctl` binary to your assets directory.
+              - Deploy `lokomotive` with old release
+                - e.g. `./lokoctl cluster apply`

Member

surajssd Nov 6, 2020

Like the previous step says that build using make build and current one says ./lokoctl cluster apply. You don't have lokoctl binary and the lokocfg files in the same place.

We can simply say make install in previous step. In in this step we just do lokoctl cluster apply from this directory.

Member

invidian Nov 17, 2020

I think previous step should be changed to use a release binary, so I think it should be OK to build + copy here.

docs/release-process/CHECKLIST.md Outdated

+.2.0 (`v0.2.0`).
+              - Checkout to old release tag
+                - e.g. `git checkout v0.1.0`

Member

surajssd Nov 6, 2020

All the commands are in this pattern like there is for e.g. and then a command. For the commands like above I understand user needs to make changes. But for other commands where we know for sure what the command is we should put it in a code block and not two back ticks.

docs/release-process/CHECKLIST.md Outdated

		@@ -0,0 +1,84 @@
		## Check list

Member

surajssd Nov 6, 2020

This PR should also create a sample lokocfg file with all the cluster features and component features. All parameterised using hcl variables and a sample lokocfg.vars file which has empty vars. So the user (fellow developer) has to just edit those values and get going.

Member

surajssd Nov 6, 2020

Now we have different platforms, we can have lokocfg files for two platforms viz. packet and aws. For that we might have to put the files in two different sub dirs.

docs/release-process/CHECKLIST.md Outdated

+              - Check all certificates are valid
+                - e.g. `kubectl get certificates -A`
+                - Certificates for all your components are valid.

Member

surajssd Nov 6, 2020

What does valid mean in this case?

Member

invidian Nov 17, 2020

Perhaps he meant Ready, as the column which shows up when you list the certificates.

docs/release-process/CHECKLIST.md Outdated

+                velero restore create --from-backup serviceaccount-backup
+                # Check serviceaccount test exist.
+                kubectl get sa test

Member

surajssd Nov 6, 2020

for velero testing we should ideally point the user to the velero usage doc. https://github.com/kinvolk/lokomotive/blob/master/docs/how-to-guides/backup-rook-ceph-volumes.md

Member

invidian Nov 17, 2020

User? This is developer documentation. But I agree on pointing to the docs until we make this testing automated.

docs/release-process/CHECKLIST.md Outdated


		- Do some blackbox testing by sending HTTP requests through MetalLB + Contour + cert-manager.

		- Check metrics for your cluster by going to Prometheus Ingress Host URL.

Member

surajssd Nov 6, 2020

We can point the user to the prometheus doc and ask them to verify all the different scenarios in there if they work.
Verify if the grafana dashboards have data.
Verify if the prometheus targets are loaded correctly.
Verify if the alerts are loaded correctly.

Member

surajssd Nov 6, 2020

https://github.com/kinvolk/lokomotive/blob/master/docs/how-to-guides/monitoring-with-prometheus-operator.md

Member

invidian Nov 17, 2020

Most of the points here would be already covered by e2e test suite, right? Prometheus data and targets.

Perhaps we should add tests for alerts and grafana data sources then.

docs/release-process/CHECKLIST.md Outdated


		- Check web-ui Ingress Host URL that you have configured works fine.

		IMPORTANT: Follow the whole process again with multi-cluster (controller node).

Member

surajssd Nov 6, 2020

I think the general question to ponder upon is that what are the tests that are single node / multi node specific? We should by default test for multi controller setup and then figure out what is single node specific and do only those tests there, no need to test everything all over again.

Member

invidian Nov 17, 2020

I agree. For controller nodes we should only test upgrade path. I like the Components test section title, perhaps we can add something like Controlplane testing for the first points in the document, then move this sentence there.

docs/release-process/CHECKLIST.md Outdated


		- Check httpbin Ingress Host URL that you have configured works fine.

		- Do some blackbox testing by sending HTTP requests through MetalLB + Contour + cert-manager.

Member

surajssd Nov 6, 2020

There could be standard tool used to do blackbox testing. Maybe curl and visit once on browser?

Member

invidian Nov 17, 2020

e2e test should easily cover that. We can have one which can be run manually, which won't be executed by the CI, which expects that MetalLB on Packet has correctly configured EIP.

docs/release-process/CHECKLIST.md Outdated


		- Check routes are added to AWS for your components. If you have used route53 DNS provider, you can check them [here](https://console.aws.amazon.com/route53/v2/home#Dashboard). Make sure to check the correct hosted zone.

		- Check Gangway Ingress Host URL that you have configured works fine.

Member

surajssd Nov 6, 2020

Gangway testing actually should verify if the authentication workflow works. This will involve going to the website and github/google auth and then using token to talk to the API server.

To extend it a bit further we could assign a role to the user email like pod reading and verify if pod reading works and other access does not work, etc.

invidian mentioned this pull request

Update Calico CRDs as part of the release upgrade process #1176

Open

invidian self-assigned this

Member

invidian commented Nov 16, 2020

I'll take it over for now, as @knrt10 is on holidays.

invidian and others added 3 commits

November 17, 2020 10:16


          Makefile: remove deprecated install-packr2 target

fafb44f

It is no longer needed.

Signed-off-by: Mateusz Gozdek <[email protected]>


          docs: move KEYS.md and RELEASING.md to releasing directory

94f270f

So that we can have all information about relase in a single place.

Signed-off-by: knrt10 <[email protected]>


          release-process: Add CHECKLIST.md

closes: #999
Signed-off-by: knrt10 <[email protected]>

invidian force-pushed the knrt10/release-checklist branch from 6c3d592 to 9176931 Compare

November 17, 2020 09:16

invidian added the priority/P3 label

Member

invidian commented Nov 18, 2020

Actually, I'd rather take care of #1031, so if someone wants to pick this up, go ahead.

invidian removed their assignment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.