Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] - Add 'boot_command' over VNC like Qemu builder #197

Open
danielfdickinson opened this issue Jun 12, 2022 · 19 comments
Open

[Feature] - Add 'boot_command' over VNC like Qemu builder #197

danielfdickinson opened this issue Jun 12, 2022 · 19 comments
Labels
enhancement New feature or request

Comments

@danielfdickinson
Copy link

danielfdickinson commented Jun 12, 2022

Is your feature request related to a problem? Please describe.

The Alpine Linux ISO does not include cloud-init which means initial setup has to be done over VNC. With the Packer Qemu build there is a boot_command option that allows to interact with the instance over VNC in order to do 'just enough' to SSH in.

I have utilized this ability to create a QCOW2 image from the Alpine ISO that includes cloud init in a public repo.

The documentation for the boot_command capability can be found in the Packer Documentation (see the 'Boot Configuration' section).

The source code for the qemu builder is at: https://github.com/hashicorp/packer-plugin-qemu/tree/main

Describe the solution you'd like

A similar boot_command capability for the Vultr plugin that allows controlling the instance via VNC in order to enable SSH access (after which regular provisioning can be be used).

Describe alternatives you've considered

Since the goal is automation doing this manually as described in the Vultr docs for Alpine Linux doesn't solve the problem, and would require doing this for every new release of Alpine Linux.

Another option would be to be able to upload a QCOW2 or RAW boot image rather than only an ISO (such as can be done with OpenStack). AIUI the snapshots option is not just a disk image but a whole VM image which means uploading a QCOW2 or RAW generated using the public repo I created, above, is not currently an option with Vultr.

EDIT: I was able to upload (but have not yet tested) a RAW image generated use the repo I mentioned from a web hosting instance I have (it would be helpful to be able to upload directly from my local machine, but that's a separate issue), so it looks like there may be a workaround for now.

EDIT #2: While it was possible to upload the image, it fails to boot an instance. So currently there is no automation-friendly workaround.

EDIT #3: Mea culpa, the workaround from the first edit works; I had an error in my Packer scripts that didn't use the snapshot properly. So there is a workround for now. Example repository at https://gitlab.com/danielfdickinson/alpine-two-stage-packer-for-vultr

From what I have read of the Alpine Docs, Wiki, and mailing list, "cloud-init" is considered too heavy and efforts are focused on tiny cloud init, and I am new to Alpine (and have not yet posted to the mailing list), so it seems that requesting adding a 'cloud-init' image to the Alpine releases would be a non-starter.

In addition this would enable more distros to be prepared for use on Vultr from their main distribution ISO.

@danielfdickinson danielfdickinson added the enhancement New feature or request label Jun 12, 2022
@Oogy
Copy link

Oogy commented Jun 13, 2022

Hello @danielfdickinson,

Another option, as Vultr supports iPXE booting, is to netboot Alpine with an iPXE script. The ssh_key and other options may be provided to the iPXE script to help with initial provisioning/access.

This should be nicer to work with than your proposed workaround, which while functional as you've mentioned does entail more process in managing the image lifecycle with Alpine releases.

If you would prefer to continue with this feature request, please note we will need to review our current roadmap/timelines before we can consider this. Pull requests are always welcome!

@danielfdickinson
Copy link
Author

Hello @Oogy,

Thank you for your response. I looked the the iPXE and netboot links you provided, and I have done netboot on a local network before. I think netboot needs more 'moving parts' than a Packer boot_command and even without boot_command, given that I already have a core image (i.e. that I just need to upload the snapshot and can provision via Packer using SSH) I think netboot would involve a lot more work.

I will add adding the boot_command capability to the Vultr plugin to my 'to do' list, although at this point I make no more promises than you 😁

Hopefully I am able to get to it sooner than later and will have a PR for you at some point.

Thank your for the suggestion. If I was starting from scratch it would be more likely be a worthwhile route for me, so others may benefit from the info.

@danielfdickinson
Copy link
Author

I decided to take another look at the iPXE option and found a Libvirt iPXE boot guide that let me know that there were less required moving parts than I thought.
My question for the Vultr Packer plugin (this repo) is whether

script_id (string) - If you've not selected a 'custom' (OS 159) operating system, this can be the id of a startup script to execute on boot. See Startup Script.

means I would not be able to use script_id (pointed to an iPXE script) with Alpine Linux because one needs to specify custom os and an iso_id for Alpine (since Alpine is not in the main list of Vultr OSes).

If I could specify an iPXE script to boot Alpine Linux, then I don't wouldn't need boot_command, and in fact using iPXE would be preferred because there wouldn't be the requirement for waits and slow simulated keyboard input, so creating the image would be must speedier.

If it's already possible to use script_id with Packer with Alpine Linux, then I'll close this issue. If not, I will create a new issue requesting that and close this issue (if it is something that is reasonably likely to happen).

@Oogy
Copy link

Oogy commented Jun 17, 2022

Hello @danielfdickinson,

I'm glad to see you've revisited the idea, iPXE is quite nice to work with IMO.

It appears the description for script_id may need to be amended, as written I believe that is only valid for a startup script of type boot.

There are 2 types of scripts supported by Vultr, boot and pxe scripts. Both are specified via the script_id.

So in the case of PXE booting, setting os_id to custom(id 159), and passing your pxe script ID to script_id should work. I have done so in the past myself. If you have any issues please let us know.

Additionally, as you are PXE booting you do not need to specify the iso_id or even have the ISO on your account since there is no ISO involved.

@danielfdickinson
Copy link
Author

@Oogy iPXE is now working for me 🎉. The description for script_id does need to be update as using a 'custom os' (159) and a PXE script worked (with one caveat).

It seems though that the kernel and initrd URLs cannot be HTTPS even though iPXE reports as having HTTPS support (HTTPS works for me with iPXE under libvirt though, so it's probably a version or build issue, possibly because my instance uses Let's Encrypt for SSL certificates).

@Oogy
Copy link

Oogy commented Jun 18, 2022

@danielfdickinson glad to hear it. On Monday I'll open up an issue for updating the docs as well as look into the HTTPS problem. Last I'd experimented with this I was netbooting Flatcar Linux using HTTPS URLs so I'm pretty sure that should work.

If you could share any errors or console screenshots that'd be a great help.

@danielfdickinson
Copy link
Author

danielfdickinson commented Jun 19, 2022

Would you like the netboot screenshots here, or is there a better place (like a Vultr ticket)?

I'll also include the applicable iPXE scripts in the info.

@Oogy
Copy link

Oogy commented Jun 19, 2022

@danielfdickinson here is fine 👍

@danielfdickinson
Copy link
Author

Here is a screenshot when kernel is https:

Screenshot from 2022-06-19 22-43-48

and here is the iPXE script:

#!ipxe

set base-url https://ipxe-boot.wildtechgarden.ca

kernel ${base-url}/boot-3.16/vmlinuz-virt console=tty0 modules=loop,squashfs quiet nomodeset alpine_repo=https://mirror.csclub.uwaterloo.ca/alpine/v3.16/main modloop=https://ipxe-boot.wildtechgarden.ca/boot-3.16/modloop-virt ssh_key="ssh-ed25519 AAAtheykey... comment@host"
initrd ${base-url}/boot-3.16/initramfs-virt
boot

And as mentioned, it works if I change

set base-url https://ipxe-boot.wildtechgarden.ca

to

set base-url http://ipxe-boot.wildtechgarden.ca

@Oogy
Copy link

Oogy commented Jun 20, 2022

Hello @danielfdickinson,

I've taken some time to look at this and I think the issue may be that the Common Name in your LE cert is different from the domain in the base-url. I have no proof of this as the iPXE errors are not terribly helpful and we cannot enable debug mode(as this requires a separate build of the iPXE binary), but it is the only notable difference I can see between your cert and my test which used https://boot.netboot.xyz.

The CA certs for your LE cert are cross-signed by the iPXE CA cert and so that should not be the issue. Could you perhaps try using a new LE certificate with the Common Name of wildtechgarden.ca and SANs wildtechgarden.ca and *.wildtechgarden.ca?

@danielfdickinson
Copy link
Author

danielfdickinson commented Jun 20, 2022

Hey @Oogy,

Thank you for looking at this. Changing the CN didn't solve the issue, but it did get me looking at things like server logs and DNS records and I realized that I had ipxe-boot.wildtechgarden.ca as a CNAME and the CNAME was not the commonName on the cert. I've switched ipxe-boot to A and AAAA records (since I did switch the CN to ipxe-boot...) and once the TTLs clear out, I'll give it another go, but I think you gave me the right idea where to look (names of CN vs DNS name). Will let you know.

@danielfdickinson
Copy link
Author

danielfdickinson commented Jun 21, 2022

I have confirmation that it is iPXE rejecting the connection and not on the server side -- with lighttpd I got (mod_openssl.c.3213) SSL: -1 5 0: No error information and when I increase the minimum cipher level the lighttpd logs change to (mod_openssl.c.3249) SSL: 1 error:1417A0C1:SSL routines:tls_post_process_client_hello: no shared cipher (ip-address) and the iPXE error message changes to "Error not permitted".

I wonder if the version of iPXE is too old and it doesn't like the cross-signed certificate (i.e. whether the version of iPXE was before LE dropped the (DigiCert?) cross-sign).

I can't test the wildtechgarden.ca CN with only SANs wildtechgarden.ca and *.wildtechgarden.ca without breaking other sites on this server, but have tried with CN === reverse DNS name === DNS on A record and AAAA record. (Specifically radicale-lighttpd-01.wildtechgarden.ca) which is also in the list of SAN.
It's a long SAN list though and maybe that is a problem. I will have to try again with a system dedicated to testing this, so I don't have to worry about the other sites on the server.

I found ipxe/ipxe#116 which adds support fragmented handshakes (e.g. due to large certificate chains). Based on that I think it is highly probably the number SANs is the problem in that it causes fragmentation. I unfortunately don't have a DNS provider compatible with a DNS challenge to do a wildcard certifiate, so unless the workaround described in the PR works I might be out of luck until I dedicate a host to serving the iPXE stuff (or at least keeping the SAN list small).

@danielfdickinson
Copy link
Author

danielfdickinson commented Jun 21, 2022

🎆 🥳 Got it!

The PR 116 for iPXE mentioned above showed me the way. I needed to use --preferrred-chain "ISRG Root X1" for me LE certificate, as described in one of the comments: ipxe/ipxe#116 (comment)

I also needed to use slightly less secure lighttpd settings than ideal (but which are the current defaults for compatibility reasons).

    ssl.openssl.ssl-conf-cmd = (
	"MinProtocol" => "TLSv1.2",
	"Options" => "ServerPreference",
	"CipherString" => "HIGH"
	#"Options" => "-ServerPreference"
	#"CipherString" => "EECDH+AESGCM:AES256+EECDH:CHACHA20"
    )

Although since I've removed the higher security cipher settings I could just omit ssl.openssl.ssl-conf-cmd altogether.

Shall I close this?

@gstrauss
Copy link

@danielfdickinson I'd like some more details, please, as lighttpd has announced plans to change TLS defaults to be stricter in a release some time in Jan 2023. What were the client limitations? Most frequently in my experience, "MinProtocol" => "TLSv1.2" is compatible with the vast majority of clients. Are you sure that the client could not support "CipherString" => "EECDH+AESGCM:AES256+EECDH:CHACHA20" when "MinProtocol" => "TLSv1.2" ?

@gstrauss
Copy link

I unfortunately don't have a DNS provider compatible with a DNS challenge to do a wildcard certifiate (sic)

lighttpd supports Let's Encrypt bootstrap using TLS-ALPN-01 verification challenge
https://wiki.lighttpd.net/HowToSimpleSSL

@danielfdickinson
Copy link
Author

@gstrauss

Are you sure that the client could not support "CipherString" => "EECDH+AESGCM:AES256+EECDH:CHACHA20" when "MinProtocol" => "TLSv1.2" ?

Yes. I get (mod_openssl.c.3249) SSL: 1 error:1417A0C1:SSL routines:tls_post_process_client_hello: no shared cipher (ip-address) if I use

    ssl.openssl.ssl-conf-cmd = (
	"MinProtocol" => "TLSv1.2",
	"Options" => "-ServerPreference"
	"CipherString" => "EECDH+AESGCM:AES256+EECDH:CHACHA20"
    )

In addition at the iPXE crypto docs I see the following table:

Protocol versions TLSv1.0 TLSv1.1 TLSv1.2
Public Key Algorithm RSA
Block cipher algorithmss AES-128-CBC AES-256-CBC
Hash algorithms MD5 SHA-1 SHA-224 SHA-256 SHA-384 SHA-512 SHA-512/224 SHA-512/256

The exact list of supported cipher suites is
RSA_WITH_AES_256_CBC_SHA256, RSA_WITH_AES_128_CBC_SHA256,
RSA_WITH_AES_256_CBC_SHA, and RSA_WITH_AES_128_CBC_SHA.

The iPXE github repo hasn't had a release in two years (sometime in 2020), and I don't see any crypto-related changes in the repo in that period of time.

@danielfdickinson
Copy link
Author

@gstrauss

lighttpd supports Let's Encrypt bootstrap using TLS-ALPN-01 verification challenge
https://wiki.lighttpd.net/HowToSimpleSSL

Nice. But not quite solving the wildcard (*.example.com) thing, which is what I was commenting on. AFAICT the TLS-ALPN-01 requires a matching DNS entry (so specific name, not a wildcard, unless I'm misreading the docs).

@gstrauss
Copy link

@danielfdickinson thank you for the details. New releases of lighttpd on or after Jan 2023 will have stricter TLS defaults and "CipherString" will need to be manually configured in lighttpd.conf to include one or more of those ciphers to work with iPXE: AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA. If you must enable the older ciphers, I'd recommend adding only AES256-SHA256 and seeing if that works, e.g. "CipherString" => "EECDH+AESGCM:AES256+EECDH:CHACHA20:AES256-SHA256" and also "Options" => "+ServerPreference" to avoid downgrade attacks to the weakest cipher option.

Also, you are correct that TLS-ALPN-01 verification challenge is not available for validating wildcard certs.
(https://letsencrypt.org/docs/challenge-types/)

@stappersg
Copy link

For your information:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants