SecureComms: Add Support for activating using InitData #2072

davidhadas · 2024-09-30T10:10:41Z

See:

Add apj.json to InitData
apf.josn includes an sc key used to activate secure comms (if not already activated using an agent-protocol-forwarder.service flag)

davidhadas · 2024-10-08T04:25:30Z

cc: @bpradipt

Install KBS and test SecureComms with KBS Based on confidential-containers#2072 which should be merged first Signed-off-by: David Hadas <[email protected]>

stevenhorsman

Some initial comments. It would be good to fix the typos in the commit message too.

stevenhorsman · 2024-10-08T13:18:03Z

src/cloud-api-adaptor/cmd/agent-protocol-forwarder/main.go

@@ -52,6 +54,8 @@ func load(path string, obj interface{}) error {
 		return fmt.Errorf("failed to decode a Agent Protocol Forwarder config file file: %s: %w", path, err)
 	}

+	logger.Printf("succesful loading config from %s\n", path)


Is this supposed to say "successfully loaded config..."?

stevenhorsman · 2024-10-08T13:20:49Z

src/cloud-api-adaptor/cmd/cloud-api-adaptor/main.go

@@ -114,7 +114,7 @@ func (cfg *daemonConfig) Setup() (cmd.Starter, error) {
 		flags.BoolVar(&secureComms, "secure-comms", false, "Use SSH to secure communication between cluster and peer pods")
 		flags.StringVar(&secureCommsInbounds, "secure-comms-inbounds", "", "Inbound tags for secure communication tunnels")
 		flags.StringVar(&secureCommsOutbounds, "secure-comms-outbounds", "", "Outbound tags for secure communication tunnels")
-		flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.kbs-operator-system:8080", "Address of a KBS Service for Secure-Comms")
+		flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.trustee-operator-system:8080", "Address of a KBS Service for Secure-Comms")


The trustee operator namespace changes should be in a separate commit with an explanation that it's triggered by the change in the trustee-operator project

I've just noticed a bunch of these change are in #2073. Is this PR supposed to depend on that one?

Yes - #2073 should be merged first

stevenhorsman · 2024-10-08T13:27:36Z

src/cloud-api-adaptor/docs/SecureComms.md

+```sh
+kubectl get secrets -n trustee-operator-system
+NAME                  TYPE     DATA   AGE
+kbs-auth-public-key   Opaque   1      28h
+kbs-client            Opaque   1      28h
+```


What's the reason for this command being added?

I've just seen this code is under #2065 now. What's going on with these PR and their duplication of code?

This is a slight improvement to the SecureComms doc which shows the correct result after following the instructions trustee operator and following the recommendation: "Make sure to uncomment the secret generation as recommended for both public and private key (kbs-auth-public-key and kbs-client secrets). "

We can add it to #2073 if it will make things clearer

I prefer to leave this extra documentation detail as is, although it also appears in #2065 - unless someone finds this very disturbing.

stevenhorsman · 2024-10-08T13:32:55Z

src/cloud-api-adaptor/docs/SecureComms.md

+kubectl -n confidential-containers-system  get cm peer-pods-cm  -o yaml | sed "s/SECURE_COMMS: \"false\"/SECURE_COMMS: \"true\"/"|kubectl apply -f -
+```
+
+Set InitData to point KBC services to IP address 127.0.0.1 


Should this have a heading like Build a podvm that enforces Secure-Comms (Optional) section does as it's an alternative to it?

Some expansion of the explanation of what this is doing and why would be nice.

I will be adapting the documentation based on this comment. Please reconsider the next version.

src/cloud-api-adaptor/docs/SecureComms.md

stevenhorsman · 2024-10-08T13:46:12Z

src/cloud-api-adaptor/cmd/cloud-api-adaptor/main.go

@@ -114,7 +114,7 @@ func (cfg *daemonConfig) Setup() (cmd.Starter, error) {
 		flags.BoolVar(&secureComms, "secure-comms", false, "Use SSH to secure communication between cluster and peer pods")
 		flags.StringVar(&secureCommsInbounds, "secure-comms-inbounds", "", "Inbound tags for secure communication tunnels")
 		flags.StringVar(&secureCommsOutbounds, "secure-comms-outbounds", "", "Outbound tags for secure communication tunnels")
-		flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.kbs-operator-system:8080", "Address of a KBS Service for Secure-Comms")
+		flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.trustee-operator-system:8080", "Address of a KBS Service for Secure-Comms")


I've just noticed a bunch of these change are in #2073. Is this PR supposed to depend on that one?

stevenhorsman · 2024-10-08T14:08:56Z

src/cloud-api-adaptor/docs/SecureComms.md

+```sh
+kubectl get secrets -n trustee-operator-system
+NAME                  TYPE     DATA   AGE
+kbs-auth-public-key   Opaque   1      28h
+kbs-client            Opaque   1      28h
+```


I've just seen this code is under #2065 now. What's going on with these PR and their duplication of code?

mkulke

I understand that kata is proceeding with an init-data approach that is passed from the runtime to the agent via a SetInitData() RPC call for the upcoming CoCo release.

This looks like a CAA-specific extension to InitData. If the project adopts kata's init-data approach, would this still work?

davidhadas · 2024-10-21T10:44:06Z

@huoqifeng, can you please review this PR as it extends your work in #1895

davidhadas · 2024-10-21T11:07:16Z

@mkulke, thanks for pointing this,

I understand that kata is proceeding with an init-data approach that is passed from the runtime to the agent via a SetInitData() RPC call for the upcoming CoCo release.

This looks like a CAA-specific extension to InitData. If the project adopts kata's init-data approach, would this still work?

This PR extends #1895 which introduced a CAA-specific extension to InitData. Here we added configuration of the APF to the already merged configuration of AA and CDH.

Regardless, it seems that the SetInitData() RPC call cannot be used under peer-pods (even without this PR) - see kata-containers/kata-containers#10163 (comment)

mkulke · 2024-10-21T11:50:29Z

@mkulke, thanks for pointing this,

I understand that kata is proceeding with an init-data approach that is passed from the runtime to the agent via a SetInitData() RPC call for the upcoming CoCo release.
This looks like a CAA-specific extension to InitData. If the project adopts kata's init-data approach, would this still work?

This PR extends #1895 which introduced a CAA-specific extension to InitData. Here we added configuration of the APF to the already merged configuration of AA and CDH.

Regardless, it seems that the SetInitData() RPC call cannot be used under peer-pods (even without this PR) - see kata-containers/kata-containers#10163 (comment)

Can you elaborate on why a SetInitData() RPC wouldn't work for CAA? I think I POC'd that approach when the init-data design was being discussed to make sure it would work and I didn't run into issues.

My understanding is that agent-protocol-forwarder, daemon.json and user-data are implementation details of CAA that are responsible for setting the stage to allow runtime <=> agent communication and kata should ideally not be concerned about those.

stevenhorsman · 2024-10-21T12:08:05Z

Can you elaborate on why a SetInitData() RPC wouldn't work for CAA? I think I POC'd that approach when the init-data design was being discussed to make sure it would work and I didn't run into issues.

In your prototype did you keep the guest-components managed by systemd, or the kata-agent? It seems to be that in if the kata-agent isn't managing them and they are started before the kata-agent then relying on config from kata-agent endpoint doesn't really make sense as we then have this weird undefined start until the kata is up and running and receiving the setInitData request.

In fairness I'm still not sure why bare-metal doesn't adopt the approach of systemd managing the processes like the peer pod does anyway to avoid this either, so I'm obviously missing something.

davidhadas · 2024-10-21T12:45:03Z

@mkulke,

Can you elaborate on why a SetInitData() RPC wouldn't work for CAA?

To add to what @stevenhorsman indicated, we use attestation when connecting CAA to APF in SecureComms.
To do attestation, we need the measurements of the configuration.
To get the measurements, we need InitData.
We connect runtime to kata-agent only after the attestation is done and keys delivered - keys we are using to complete the establishment of the CAA to APF secure communication channel.

I would assume that allowing someone to change the InitData (using the kata agent RPC) after attestation was already done and after keys delivered to the podvm, is not what we want to do. Am I missing something?

mkulke · 2024-10-21T13:01:28Z

In your prototype did you keep the guest-components managed by systemd, or the kata-agent? It seems to be that in if the kata-agent isn't managing them and they are started before the kata-agent then relying on config from kata-agent endpoint doesn't really make sense as we then have this weird undefined start until the kata is up and running and receiving the setInitData request.

In fairness I'm still not sure why bare-metal doesn't adopt the approach of systemd managing the processes like the peer pod does anyway to avoid this either, so I'm obviously missing something.

I would have to dig a bit to find the details, but I think in the PoC code kata-agent was talking via ttRPC to attestation-agent to update the configuration, so it was indeed half-initialized. however, that endpoint is about to be removed as I understand, though. somehow AA will now have to perform a binding to the TEE itself (e.g. compare host-data, extend an init-data PCR).

mkulke · 2024-10-21T13:42:49Z

We connect runtime to kata-agent only after the attestation is done and keys delivered - keys we are using to complete the establishment of the CAA to APF secure communication channel.

I understand this applies if you use the secure communications feature, but in a default CAA installation there is no attestation ceremony before runtime <=> agent communication is established, afaik?

Add apj.json to InitData apf.josn includes a secure-comms key used to activate secure comms (if not already activated using an agent-protocol-forwarder.service flag) Signed-off-by: David Hadas <[email protected]>

davidhadas · 2024-11-12T13:39:41Z

@mkulke, @stevenhorsman, @bpradipt

This PR follows on the footsteps of the initData support that already exists in peer pods to-date.
While we have a discussion if to change this mechanism some time in the future to follow what will be done by Kata RPC etc., is there any impediment to continue supporting the existing mechanism that we have till now?
Is there any impediment to adjust it to also deliver the secureComms flag?

I would like to suggest that whatever implementation we will have for initData, we will still need to add this flag to enable/disable secureComms using measured initData (or whatever we choose to call it), so adding it now seems like a progress in the right direction.

The two benefits of adding it now and not waiting is that (1) it will allow e2e testing of the secureComms with a complete attestation, and key retrieval phase, (2) it will allow using/testing secureComms by users/downstream without having a dedicated podvm image as needed today.

I would prefer to move forward and add the e2e tests regardless of how the future measured initData will be implemented in peer-pods and by following the same mechanism that works today. Does this make sense?

mkulke · 2024-11-12T16:54:23Z

@mkulke, @stevenhorsman, @bpradipt

This PR follows on the footsteps of the initData support that already exists in peer pods to-date. While we have a discussion if to change this mechanism some time in the future to follow what will be done by Kata RPC etc., is there any impediment to continue supporting the existing mechanism that we have till now? Is there any impediment to adjust it to also deliver the secureComms flag?

afaiu the init-data feature will be in next release of coco and unless there will be a change of plans the agent will receive init-data via RPC then. That means whatever we are doing in terms of initdata on the daemonset and podvm is being redundant/conflicting (since a CAA release is synced with kata/guest-components/operator).

To a coco user this change should (mostly) be transparent, but internally we would have to remove the related code and AA + CDH would only run after the agent has received init-data.

in this context I would not recommend building anything on top of the initdata impl in peerpods, currently.

huoqifeng · 2024-12-04T03:36:38Z

@huoqifeng, can you please review this PR as it extends your work in #1895

Sorry my late, I'll have a look...

huoqifeng · 2024-12-04T05:14:11Z

src/cloud-api-adaptor/docs/SecureComms.md


+[token_configs.kbs]


As KBS endpoint is different when enabling SecureComms, shall we also add link from ./initdata.md to this file?

davidhadas requested a review from a team as a code owner September 30, 2024 10:10

davidhadas force-pushed the secComms_initData branch 2 times, most recently from 8630009 to 206ce0f Compare October 7, 2024 12:37

davidhadas mentioned this pull request Oct 8, 2024

SecureComms: E2e test SecureComms with KBS #2093

Open

stevenhorsman reviewed Oct 8, 2024

View reviewed changes

davidhadas force-pushed the secComms_initData branch from 206ce0f to 1a5d0a9 Compare October 9, 2024 08:50

mkulke reviewed Oct 11, 2024

View reviewed changes

davidhadas mentioned this pull request Oct 16, 2024

Feat | Implement initdata for bare-metal/qemu hypervisor kata-containers/kata-containers#10163

Closed

davidhadas force-pushed the secComms_initData branch from 1a5d0a9 to 914490c Compare October 21, 2024 05:31

davidhadas force-pushed the secComms_initData branch from 914490c to f4608e5 Compare November 8, 2024 12:42

SecureComms: Activate PP SC using InitData

05bcb2c

Add apj.json to InitData apf.josn includes a secure-comms key used to activate secure comms (if not already activated using an agent-protocol-forwarder.service flag) Signed-off-by: David Hadas <[email protected]>

davidhadas force-pushed the secComms_initData branch from f4608e5 to 05bcb2c Compare November 8, 2024 12:45

davidhadas requested review from mkulke, stevenhorsman and bpradipt November 8, 2024 15:40

davidhadas self-assigned this Nov 9, 2024

huoqifeng reviewed Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SecureComms: Add Support for activating using InitData #2072

SecureComms: Add Support for activating using InitData #2072

davidhadas commented Sep 30, 2024 •

edited

Loading

davidhadas commented Oct 8, 2024

stevenhorsman left a comment

stevenhorsman Oct 8, 2024

stevenhorsman Oct 8, 2024

stevenhorsman Oct 8, 2024

davidhadas Oct 8, 2024

stevenhorsman Oct 8, 2024

stevenhorsman Oct 8, 2024

davidhadas Oct 8, 2024

davidhadas Oct 9, 2024

stevenhorsman Oct 8, 2024

stevenhorsman Oct 8, 2024

davidhadas Oct 9, 2024

stevenhorsman Oct 8, 2024

stevenhorsman Oct 8, 2024

mkulke left a comment

davidhadas commented Oct 21, 2024

davidhadas commented Oct 21, 2024 •

edited

Loading

mkulke commented Oct 21, 2024

stevenhorsman commented Oct 21, 2024

davidhadas commented Oct 21, 2024 •

edited

Loading

mkulke commented Oct 21, 2024

mkulke commented Oct 21, 2024

davidhadas commented Nov 12, 2024

mkulke commented Nov 12, 2024

huoqifeng commented Dec 4, 2024

huoqifeng Dec 4, 2024


		[token_configs.kbs]

SecureComms: Add Support for activating using InitData #2072

Are you sure you want to change the base?

SecureComms: Add Support for activating using InitData #2072

Conversation

davidhadas commented Sep 30, 2024 • edited Loading

davidhadas commented Oct 8, 2024

stevenhorsman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkulke left a comment

Choose a reason for hiding this comment

davidhadas commented Oct 21, 2024

davidhadas commented Oct 21, 2024 • edited Loading

mkulke commented Oct 21, 2024

stevenhorsman commented Oct 21, 2024

davidhadas commented Oct 21, 2024 • edited Loading

mkulke commented Oct 21, 2024

mkulke commented Oct 21, 2024

davidhadas commented Nov 12, 2024

mkulke commented Nov 12, 2024

huoqifeng commented Dec 4, 2024

Choose a reason for hiding this comment

davidhadas commented Sep 30, 2024 •

edited

Loading

davidhadas commented Oct 21, 2024 •

edited

Loading

davidhadas commented Oct 21, 2024 •

edited

Loading