-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Version for Pre-Upgrade Check. #15
base: main
Are you sure you want to change the base?
Conversation
Since we were waiting I saw that there's a known issue that can prevented by adding a label to a secret: So I went ahead and added test for that label with the suggest fix from the docs. Looks like the upstream issue is this: |
…a node requirement together.
Adding another check for this known issue: harvester/harvester#3863 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ParadoxGuitarist, Thank you for the awesome work! It looks pretty neat. I have only some suggestions. Please kindly take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the fix does not take effect. This is a CP node and the cluster does not have the rancher-monitoring
add-on enabled.
node-0:~ # check.sh
==============================
Starting Host check...
Host Test: Pass
==============================
Starting Certificates check...
Certificates Test: Pass
==============================
Starting Node Free Space check...
Error from server (NotFound): services "rancher-monitoring-prometheus" not found
Script wasn't able to get valid response from the API.
You may need to log into each of the nodes and run 'df -h /usr/local' to ensure there's more than 30 GB of free space available.
Node-Free-Space Test: Failed
==============================
Starting Helm Bundle status check...
Helm-Bundles Test: Pass
==============================
Starting Harvester Bundle status check...
Harvester-Bundles Test: Pass
==============================
Starting Node Status check...
Node-Status Test: Pass
==============================
Starting CAPI Cluster State check...
CAPI-Cluster-State Test: Pass
==============================
Starting CAPI Machine Count check...
CAPI-Machine-Count Test: Pass
==============================
Starting CAPI Machine State check...
CAPI-Machine-State Test: Pass
==============================
Starting Longhorn Volume Health Status check...
Longhorn-Volume-Health-Status Test: Pass
==============================
Starting Stale Longhorn Volumes check...
Stale-Longhorn-Volumes Test: Pass
==============================
Starting Pod Status check...
Pod-Status Test: Pass
==============================
Starting Kubeconfig Secret check...
Kubeconfig Secret Test: Pass
==============================
WARN: There are 1 failing checks: Node-Free-Space
It failed, (which is what I think we want) but it didn't give out the correct message. Seems like that string isn't empty? Can you pull the verbose logs and see what it's trying to look up with the IP? It should be in there from this line. |
Aha, there's still an output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, really appreciate the contribution @ParadoxGuitarist! Normally we wait for two approves before merging the PR. cc @bk201
There was some versioning for the update
check.sh
that seemed to be lagging behind, but after talking with support it seems like it's good for all the other versions. I added a new folder with a1.x
since it seems to be expected that it'll continue to work for 1.4 as well. A few other things this PR adds:local-kubeconfig
secret in thefleet-local
namespace is missing a label that causes a known issue in the 1.2.2 > 1.3.1 upgrade.-l /path/to/file.log
) will generate a log file.-v
flag) messages to help highlight/emphasize more important information (errors and failures). (Note logs, if enabled, always contain verbose messages)My feeling won't get hurt at all if you change/alter any parts of the PR. I'm also happy to make the changes myself if you identify any.