Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add encrypted scratch space for sandbox #2212

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mkulke
Copy link
Collaborator

@mkulke mkulke commented Dec 16, 2024

fixes #1921

This is a PoC to address the problem of secure scratch space in which we can download and unpack images to the sandbox, write files at runtime etc.

Currently this is either backed by storage that is not protected by the TEE (packer) or as tmpfs mounts that will consume ram during runtime (mkosi). For larger images like pytorch w/ cuda this implies in theory that we have to set aside 8GB of ram just to be able to download and unpack the image. tmpfs mounts are by default limited to a ratio of the ram, so we might end up having to overprovision the machine to be able to run a workload.

The proposes change introduces a CAA-wide parameter that allows to specify the disk size. A cloud provider can use this as an indicator of the size of the to-be-provisioned podvm disk. A zero value means that we stick to the default size of the image.

if a size has been given (the mkosi image is ~0.5gb) it will extend the image to that size. At launch the podvm will claim that space and create an ad-hoc encrypted volume on it. Depending on the presence of a marker file in user-data the agent unit will mount that to /run/kata-containers prior to launching kata-agent.

I'm not super happy about the API (I would prefer to specify it per pod) and the implementation (the marker file) but I wanted to push early to get feedback. (also, it's only implemented for azure at the moment)

FWIW it does what it's supposed to do, see some numbers on #1958

@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch 3 times, most recently from a6c6fe4 to c19898a Compare December 16, 2024 17:58
@@ -64,6 +64,8 @@ type ServerConfig struct {
SecureCommsPpOutbounds string
SecureCommsKbsAddress string
PeerPodsLimitPerNode int
UseEncryptedDisk bool
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover

@mkulke mkulke added test_e2e_libvirt Run Libvirt e2e tests and removed test_e2e_libvirt Run Libvirt e2e tests labels Dec 17, 2024
@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch 11 times, most recently from 5fb77d7 to ab7fccf Compare December 17, 2024 16:41
@@ -30,6 +30,7 @@ optionals+=""
[[ "${SECURE_COMMS_PP_OUTBOUNDS}" ]] && optionals+="-secure-comms-pp-outbounds ${SECURE_COMMS_PP_OUTBOUNDS} "
[[ "${SECURE_COMMS_KBS_ADDR}" ]] && optionals+="-secure-comms-kbs ${SECURE_COMMS_KBS_ADDR} "
[[ "${PEERPODS_LIMIT_PER_NODE}" ]] && optionals+="-peerpods-limit-per-node ${PEERPODS_LIMIT_PER_NODE} "
[[ "${DISK_SIZE}" ]] && optionals+="-disk-size ${DISK_SIZE} "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AWS provider we use a param "ROOT_VOLUME_SIZE" as a way to expand the VM root volume size if requested explicitly. May be the same can be reused? Or we can settle on another name for all the providers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROOT_VOLUME_SIZE would be good IMO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be made a generic param or copied to azure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik it will not be straight forward to implement disk resize for all the providers. So I think copying to azure should be ok for now.

@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from ab7fccf to 0e961cf Compare December 17, 2024 16:44
@bpradipt
Copy link
Member

if a size has been given (the mkosi image is ~0.5gb) it will extend the image to that size. At launch the podvm will claim that space and create an ad-hoc encrypted volume on it. Depending on the presence of a marker file in user-data the agent unit will mount that to /run/kata-containers prior to launching kata-agent.

Would it also make sense to extend this approach to use add-on disk like available with some instance types ? Not needed for initial iteration, but is this something we should consider and lay the foundation now?

I'm not super happy about the API (I would prefer to specify it per pod) and the implementation (the marker file) but I wanted to push early to get feedback. (also, it's only implemented for azure at the moment)

I think it would be better to also add an annotation (eg io.katacontainers.config.hypervisor.root_disk_size) in kata for remote hypervisor to use for per-pod scratch space. Similar to what we use for other such annotations.

@mkulke
Copy link
Collaborator Author

mkulke commented Dec 17, 2024

Would it also make sense to extend this approach to use add-on disk like available with some instance types ? Not needed for initial iteration, but is this something we should consider and lay the foundation now?

for future extensibility, I think we can freely iterate. We don't currently have a version contract between PodVM and CAA. We would have to look at the user-facing api maybe to make this possible.

I think it would be better to also add an annotation (eg io.katacontainers.config.hypervisor.root_disk_size) in kata for remote hypervisor to use for per-pod scratch space. Similar to what we use for other such annotations.

I agree, I didn't spot any existing fitting annotation in kata at first glance, so we might have to add it.

@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 0e961cf to 6655aa3 Compare December 17, 2024 18:04
@mkulke mkulke added the test_e2e_libvirt Run Libvirt e2e tests label Dec 17, 2024
@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 6655aa3 to 34a553c Compare December 17, 2024 18:13
@bpradipt
Copy link
Member

I think it would be better to also add an annotation (eg io.katacontainers.config.hypervisor.root_disk_size) in kata for remote hypervisor to use for per-pod scratch space. Similar to what we use for other such annotations.

I agree, I didn't spot any existing fitting annotation in kata at first glance, so we might have to add it.

Yes, a new annotation will need to be added.

@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch 4 times, most recently from ea95283 to a097ec7 Compare December 18, 2024 13:59
@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from a097ec7 to ba8cc4c Compare December 18, 2024 14:04
@mkulke mkulke added test_e2e_libvirt Run Libvirt e2e tests and removed test_e2e_libvirt Run Libvirt e2e tests labels Dec 18, 2024
This introduces a new CAA parameter that allows to specify the disk
size. If we have a disk size that is larger than 0 a cloud provider can
attempt to consider this size to create space for an encrypted
scratch partition that can be mounted to /run/container.

Signed-off-by: Magnus Kulke <[email protected]>
This adds the configuration for an encrypted scratch space on an mkosi
image. At bootup a /dev/sda4 partition will be created and encrypted
with LUKS using an ephemeral key.

The partition will use the space available on the image volume. By
default the qcow2 image has 100mb allocated for this space. This amount
of space will only work for very small images, hence we do not mount the
scratch space to `/run/kata-container` by default.

If the kata-agent service units encounters a `/run/peerpod/mount-scratch` file
it will mount the encrypted partition `/dev/sda4` to `/run/kata-containers`.

This file is provisioned by `process-user-data`, configured by the CAA
daemonset.

Signed-off-by: Magnus Kulke <[email protected]>
The CI wasn't building debug images, due string/bool mismatch.

Signed-off-by: Magnus Kulke <[email protected]>
@mkulke mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 4b05766 to c0c9f4c Compare December 20, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test_e2e_libvirt Run Libvirt e2e tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podvm-mkosi: need allocate more space for "/run" to store large image data
2 participants