Skip to content

feat: stage AMD SEV-SNP attestation support#703

Open
clawdbot-glitch003 wants to merge 24 commits into
Dstack-TEE:masterfrom
clawdbot-glitch003:feat/amd-sev-snp-conversion
Open

feat: stage AMD SEV-SNP attestation support#703
clawdbot-glitch003 wants to merge 24 commits into
Dstack-TEE:masterfrom
clawdbot-glitch003:feat/amd-sev-snp-conversion

Conversation

@clawdbot-glitch003
Copy link
Copy Markdown

@clawdbot-glitch003 clawdbot-glitch003 commented Jun 1, 2026

Summary

  • Adds staged AMD SEV-SNP attestation support and KMS dry-run authorization plumbing while preserving existing TDX/Nitro/GCP behavior.
  • Verifies SNP reports/cert chains, recomputes SNP launch measurement from trusted launch inputs, and builds SNP-aware KMS BootInfo from verified evidence.
  • Keeps SNP key/cert release fail-closed: app keys, KMS keys, signing certs, and temp CA material are explicitly blocked for SNP until production release policy is finalized.
  • Replaces the staged SNP TCB placeholder with verifier-derived TCB status from the signed AMD report and propagates advisory IDs through auth policy.

Security posture

  • SNP authorization is dry-run/staged only.
  • tcbStatus defaults remain strict (UpToDate only); advisory IDs are denied unless explicitly allowlisted.
  • Non-SNP paths continue through the existing boot-info and auth flow.

Validation

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features
  • cargo test -p dstack-attest --all-features
  • cargo check --workspace --all-features
  • git diff --check
  • cd kms/auth-simple && npx oxlint . && npx vitest run
  • Independent security-focused review of the SNP TCB/advisory diff: no blockers.

Notes

  • AMD report/VCEK evidence does not carry a direct advisory-list field, so verifier-emitted advisory_ids is currently explicit and empty; the field is propagated fail-closed for future revocation/advisory collateral integration.
  • One ignored live golden-vector test remains available for SNP-capable hosts: cargo test -p dstack-kms --all-features recomputation_matches_sev_snp_measure_live_golden_vector -- --ignored --nocapture.

Review-readiness note

  • See docs/amd-sev-snp-review-readiness.md for the fail-closed review boundary, live sev-snp-measure golden-vector proof, guest attestation proof summary, and validation checklist.

@clawdbot-glitch003
Copy link
Copy Markdown
Author

SEV-SNP TCB/advisory policy slice is pushed.

What changed:

  • VerifiedAmdSnpReport now carries verifier-derived AMD SNP TCB info from the signed report (current_tcb, reported_tcb, committed_tcb, launch_tcb).
  • KMS SNP BootInfo.tcb_status now comes from that verified report data instead of the old snp-verified-basic-policy placeholder.
    • maps to UpToDate only when current/reported/committed/launch TCB all match;
    • maps to OutOfDate otherwise, which stays denied by default.
  • VerifiedAmdSnpReport.advisory_ids is now explicit and propagated into KMS BootInfo; it is currently empty because the AMD report/VCEK evidence does not carry a direct advisory-list field.
  • The direct fake/default UpToDate SNP boot-info helper is now test-only; production goes through verified attestation.
  • auth-simple docs/tests now describe verifier-derived statuses instead of the placeholder and keep defaults strict: allowedTcbStatuses = ["UpToDate"], allowedAdvisoryIds = [].

Still fail-closed:

  • SNP key/cert release remains blocked for app keys, KMS keys, signing certs, and temp CA material.
  • Any non-UpToDate status or any advisory ID remains denied unless explicitly allowlisted.

Validation:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features
  • cargo test -p dstack-attest --all-features
  • cargo check --workspace --all-features
  • git diff --check
  • cd kms/auth-simple && npx oxlint . && npx vitest run
  • independent review: no blockers

@clawdbot-glitch003
Copy link
Copy Markdown
Author

Continued with the next quality-gate slice and pushed a small clippy cleanup commit.

Commit:

  • a0ff6efa chore: satisfy sev-snp workspace clippy

What changed:

  • removed a needless return in dstack attestation-mode detection without changing TDX/SNP selection semantics;
  • simplified KMS onboarding response error propagation (Ok(...?) -> direct Result return), preserving behavior;
  • derived Default for TeePlatform with Auto as the default variant, preserving the conservative default.

Validation now passing:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features
  • cargo test -p dstack-attest --all-features
  • cargo test -p dstack-vmm --all-features
  • cargo check --workspace --all-features
  • cargo clippy --workspace --all-features -- -D warnings --allow unused_variables
  • git diff --check
  • prior auth-simple validation remains: cd kms/auth-simple && npx oxlint . && npx vitest run

Independent review of the cleanup diff found no behavior/security regressions.

@clawdbot-glitch003
Copy link
Copy Markdown
Author

Milestone 1 is done: PR #703 is now review-ready staging for AMD SEV-SNP, still without production key release.

New commit:

  • 93354eb6 docs: add sev-snp review readiness note

What changed:

  • Added docs/amd-sev-snp-review-readiness.md documenting:
    • exact review boundary;
    • fail-closed SNP key/cert release posture;
    • strict TCB/advisory defaults;
    • live sev-snp-measure golden-vector proof;
    • prior SNP guest attestation proof summary;
    • local validation commands.
  • Refreshed live golden-vector proof on dedicated-m24-fork at 2026-06-02T19:49:14Z:
    • ignored live test passed: cargo test -p dstack-kms --all-features recomputation_matches_sev_snp_measure_live_golden_vector -- --ignored --nocapture
    • measurement remains 6497fb9f90dc4a322228a8a5eb14742e09067bc44c184c2068d583ef628b5bae8c6cf15d91fe1bc0b7a8cbcc575be370

Validation passed after doc/proof refresh:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features
  • cargo test -p dstack-attest --all-features
  • cargo test -p dstack-vmm --all-features
  • cargo check --workspace --all-features
  • cargo clippy --workspace --all-features -- -D warnings --allow unused_variables
  • git diff --check
  • cd kms/auth-simple && npx oxlint . && npx vitest run
  • independent review of the review-ready doc/code posture: no blockers

I am marking the PR ready for review now. Milestone 2 remains separate: production SNP key release policy + revocation/advisory collateral + guarded release enablement.

@clawdbot-glitch003 clawdbot-glitch003 marked this pull request as ready for review June 2, 2026 19:57
@clawdbot-glitch003
Copy link
Copy Markdown
Author

Milestone 2 is now implemented and pushed.

Commit: 6cb351f9 feat: enable guarded sev-snp key release

What changed:

  • Added local KMS [core.sev_snp_key_release] gate for AMD SEV-SNP key/cert material.
  • Default remains fail-closed: enabled = false, allowed_tcb_statuses = ["UpToDate"], allowed_advisory_ids = [].
  • Release requires both:
    1. verified SNP attestation + recomputed launch measurement + external auth API allow, and
    2. explicit local KMS release opt-in with acceptable TCB/advisory state.
  • Guarded all sensitive SNP release surfaces:
    • GetAppKey
    • GetKmsKey
    • SignCert
    • self-authorized GetTempCaCert
  • Added startup safety: KMS rejects sev_snp_key_release.enabled = true unless enforce_self_authorization = true, so temp-CA self-release cannot bypass SNP release checks in production config.
  • Updated kms/kms.toml and docs/amd-sev-snp-review-readiness.md with the opt-in release policy.

Validation passed:

cargo fmt --all
cargo test -p dstack-kms --all-features
cargo test -p dstack-attest --all-features
cargo test -p dstack-vmm --all-features
cargo check --workspace --all-features
cargo clippy --workspace --all-features -- -D warnings --allow unused_variables
git diff --check
cd kms/auth-simple && npx oxlint . && npx vitest run

Independent security review: no release-gate blockers found after the self-authorization startup-safety fix.

@clawdbot-glitch003
Copy link
Copy Markdown
Author

SNP E2E smoke follow-up

I kept going on the manual SNP smoke on chris@173.234.27.162 and pushed the fixes/docs in fe08b86f fix: bind sev-snp vm launch inputs.

What the smoke found/fixed:

  • VMM .sys-config.json now includes sev_snp_measurement so KMS SNP BootInfo recomputation has the same launch inputs QEMU used.
  • VMM now accepts released image metadata where rootfs_hash is only present as dstack.rootfs_hash=... in the kernel cmdline.
  • SNP QEMU launch now uses EPYC-v4 and confidential virtio PCI options (disable-legacy=on,iommu_platform=true) for SNP-launched virtio devices.

Smoke status:

  • Tested dstack-0.5.11 and dstack-dev-0.5.11 with PR-built dstack-vmm/supervisor/dstack-kms, QEMU 10.0.2, and SNP OVMF.
  • Both SNP runs reached OVMF loading the measured kernel/cmdline/initrd path and emitted:
    • EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
  • Neither completed Linux/userspace boot before timeout, so the full dstack-managed guest -> KMS GetAppKey hardware E2E is still blocked before KMS userspace/app-key exercise.
  • Control check: the same dstack-dev-0.5.11 kernel/initrd/rootfs boots without SNP and reaches dstack Guest Preparation Service, narrowing the blocker to SNP+OVMF direct-kernel boot compatibility rather than KMS release policy.
  • No key/secret material was returned.

Validation passed after the fixes:

cargo fmt --all
cargo test -p dstack-vmm --all-features
cargo test -p dstack-kms --all-features
cargo test -p dstack-attest --all-features
cargo check --workspace --all-features
cargo clippy --workspace --all-features -- -D warnings --allow unused_variables
git diff --check
cd kms/auth-simple && npx oxlint . && npx vitest run

@clawdbot-glitch003
Copy link
Copy Markdown
Author

AMD SEV-SNP manual E2E smoke update

I pushed a follow-up commit that completes the dstack-managed SNP smoke path:

  • Commit: 0a08253a fix: complete sev-snp key release smoke path
  • Smoke host: chris@173.234.27.162
  • QEMU: 10.0.2
  • OVMF: /opt/AMDSEV/usr/local/share/qemu/OVMF.fd (67e7a7027437823e9c166a60d00666d5d5391e13050488cad5cc2acd913fab4a)
  • Image: dstack-dev-0.5.11-snp-dnsfix

What the smoke proved

  • KMS SNP guest booted Linux/userspace and started dstack-kms.
  • App SNP guest booted Linux/userspace and requested app keys from KMS.
  • KMS self auth and app auth both succeeded through auth-simple:
    • /bootAuth/kms -> 200
    • /bootAuth/app -> 200
  • App guest reached GetTempCaCert and GetAppKey against the SNP-backed KMS.
  • KMS metrics after app request:
    • dstack_kms_attestation_requests_total 1
    • dstack_kms_attestation_failures_total 0

Failure gate also exercised

The lab host reports verifier-derived tcbStatus = "OutOfDate". With the default strict release policy (allowed_tcb_statuses = ["UpToDate"]), the app guest was denied as expected:

error: "tcb_status is not allowed"

Then, with an explicit lab-only allowlist (["UpToDate", "OutOfDate"]), the same flow succeeded. Production defaults remain fail-closed.

Fixes included

  • Preserve the released image's original kernel cmdline in SNP measurement recomputation, then append measured docker_compose_hash, rootfs_hash, and app_id exactly like the VMM launch path.
  • Include base_cmdline in VMM-provided sev_snp_measurement input.
  • Add AMD KDS fallback for SNP reports that do not carry cert collateral: fetch ARK/ASK/VCEK from KDS using report chip_id + reported TCB and verify fail-closed.
  • Add configfs TSM -> extended-report ioctl fallback for cert-chain collection.
  • Let SNP guests skip TDX-only app-info / mr_config_id checks while preserving non-SNP behavior.
  • Make dstack-prepare.sh robust for SNP smoke boots (sev-guest detection, early chronyc tolerance, DNS fallback).

Validation run

All passed locally:

cargo fmt --all
cargo test -p dstack-attest --all-features
cargo test -p dstack-util --all-features
cargo test -p dstack-kms --all-features
cargo test -p dstack-vmm --all-features
cargo check --workspace --all-features
cargo clippy --workspace --all-features -- -D warnings --allow unused_variables
git diff --check
cd kms/auth-simple && npx oxlint . && npx vitest run

No secret/key material was included in logs or this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant