Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yet another case of "x509: certificate signed by unknown authority, requeuing" #278

Open
johanneskastl opened this issue Nov 28, 2023 · 9 comments

Comments

@johanneskastl
Copy link

Version
v0.13.1

Platform/Architecture
openSUSE MicroOS 20231126 (immutable based on openSUSE Tumbleweed)

Describe the bug

time="2023-11-28T06:18:40Z" level=error msg="error syncing 'system-upgrade/k3s-server': handler system-upgrade-controller: Get \"https://update.k3s.io/v1-release/channels/stable\": x509: certificate signed by unknown authority, requeuing"

To Reproduce
Use the following in your plan instead of a version:

channel: https://update.k3s.io/v1-release/channels/stable

Expected behavior
The TLS certificate should be accepted.

Actual behavior
Something goes wrong when trying to connect via HTTPS

I checked the mounts in the deployment, and all of them are existing on the host:

        volumeMounts:
        - mountPath: /etc/ssl
          name: etc-ssl
          readOnly: true
        - mountPath: /etc/pki
          name: etc-pki
          readOnly: true
        - mountPath: /etc/ca-certificates
          name: etc-ca-certificates
          readOnly: true
        - mountPath: /tmp
          name: tmp
$ ls -ld /etc/pki/ /etc/ssl/ /etc/ca-certificates/
drwxr-xr-x. 1 root root  16 14. Jun 20:05 /etc/ca-certificates//
drwxr-xr-x. 1 root root  10 22. Nov 18:05 /etc/pki//
drwxr-xr-x. 1 root root 198 17. Nov 20:29 /etc/ssl//

Additional context
The bug was reported multiple times in different constellations:

What I failed to find is a clear description, which files the image looks for.

Or a reason, why it does not bring its own ca-certificates and just mounts the host's certificates in addition, in case someone is using an internal CA.

@johanneskastl
Copy link
Author

My guess is that the links inside the directories are messing things up:

$ ll /etc/ssl/
total 56K
lrwxrwxrwx. 1 root root  43 14. Jun 20:05 ca-bundle.pem -> ../../var/lib/ca-certificates/ca-bundle.pem
lrwxrwxrwx. 1 root root  33 14. Jun 20:05 certs -> ../../var/lib/ca-certificates/pem/
-rw-r--r--. 1 root root 412 17. Nov 20:28 ct_log_list.cnf
drwxr-xr-x. 1 root root   0 17. Nov 20:39 engdef.d/
drwxr-xr-x. 1 root root   0 17. Nov 20:39 engines.d/
-rw-r--r--. 1 root root 12K 17. Nov 20:39 openssl-1_1.cnf
-rw-r--r--. 1 root root 13K 17. Nov 20:28 openssl.cnf
-rw-r--r--. 1 root root 13K 17. Nov 20:29 openssl-orig.cnf
drwx------. 1 root root   0 17. Nov 20:28 private/
$ ll /etc/pki/
total 0
drwxr-xr-x. 1 root root 50 22. Nov 18:05 trust/
$ ll /etc/ca-certificates/
total 0
drwxr-xr-x. 1 root root 0 14. Jun 20:05 update.d/
$

I just changed the mount for /etc/ssl/ to mount the host's /var/lib/ca-certificates/pem/ directory to /etc/ssl/certs/ inside the controller, and the upgrade started and finished successfully.

@brandond
Copy link
Member

Yeah, sounds like the symlinks outside the mounted path are breaking things.

The idea is that the host CA bundle is more likely to be up-to-date than the image, or (as you said) the update channel may not be trusted by public CA bundles.

As you noted, use of distros with non-standard filesystem layouts will require adjustments to the deployment manifest.

@johanneskastl
Copy link
Author

johanneskastl commented Nov 28, 2023

As you noted, use of distros with non-standard filesystem layouts will require adjustments to the deployment manifest.

Sorry, but SLES15 has that, and this is ancient. So I am not sure how "non-standard" that is. ;-)

@johanneskastl
Copy link
Author

As you noted, use of distros with non-standard filesystem layouts will require adjustments to the deployment manifest.

Sorry, but SLES15 has that, and this is ancient. So I am not sure how "non-standard" that is. ;-)

RHEL8 has a link from /etc/ssl/certs/ to /etc/pki/tls/certs/, but as that gets mounted separately it might work.

@brandond
Copy link
Member

brandond commented Nov 28, 2023

Yeah, the problem here is that content under /etc/ssl/... is linked to /var/lib/ca-certificates/.... /var/lib/ca-certificates is not a standard path and therefore isn't mounted by the default deplyoment manifest. The easiest fix would probably be to add /var/lib/ca-certificates as a mount.

You can see the paths expected by golang at https://go.dev/src/crypto/x509/root_linux.go

@johanneskastl
Copy link
Author

Yes, that is exactly what I did. I added a kustomization (as there is unfortunately no helm chart for system-upgrade-controller) to patch the deployment and mount /var/lib/ca-certificates/pem/ to /etc/ssl/certs/

@dweomer
Copy link
Contributor

dweomer commented Feb 13, 2024

The default manifest is quite simple but largely inclusive. As such, it makes for a pretty decent, broadly applicable example. I used to hate/fear Helm unreasonably at the time I started this project (now I hate Helm for good reasons, I assure you) and so I never developed a chart!

@nate-duke
Copy link

Any guidance on workarounds for this? I've tried making the files in the system path match what i expect is inside the container based on the error messages but without a shell in the container it's down to guesswork.

@dweomer
Copy link
Contributor

dweomer commented Apr 24, 2024

Any guidance on workarounds for this? I've tried making the files in the system path match what i expect is inside the container based on the error messages but without a shell in the container it's down to guesswork.

SUC leverages the default TLS implementation that comes with the golang runtime, therefore it searches for trust store as indicated by:

If the host path mounts aren't working this is typically caused by:

  • symlinks under /etc/{pki,ssl,tls} that aren't satisfied in the container
  • an out of date or misconfigured trust store on the host

So, if curl works on the host but not in the container, you've probably got a symlink problem.

There are a number of ways to fix this because the SUC manifests as provided are demonstrative and not authoritative: you have control over its runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants