Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions docs/INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# `pup install` — design

This doc covers the design and rollout plan for `pup install <component> <platform>` — pup's unified entry point for installing Datadog components on a target. The first concrete leaf is `pup install ssi linux`, the rest follow in subsequent PRs.

## Motivation

Today, multiple agent-skills install Datadog components by embedding `bash -c "$(curl -L install.datadoghq.com/scripts/install_script_agent7.sh)"` in their SKILL.md. Security scanners (Snyk, etc.) correctly flag this as RCE-shaped — and the script itself lives outside any repo we control, with no pinned version or audit trail.

Workarounds we evaluated:

- **Pointing skills at docs** — breaks the "user does nothing" promise that SSI sells.
- **Vendoring the install script in the agent-skills repo** — Datadog ships updates per release; we'd inherit a sync treadmill.
- **Manual apt/yum install in the skill** — only works for the agent; the SSI components (apm-inject, language libs) aren't deb/rpm packages, they're OCI artifacts delivered by the `datadog-installer` binary.
- **Snyk policy exception** — pragmatic but doesn't fix the underlying audit-trail issue.

The clean answer is to move install logic out of markdown and into pup. The skill becomes a single line (`pup install ssi linux --host <X>`), and the install steps execute in compiled, reviewed Rust — using HTTP fetch + SHA-256 verification + signed package install rather than fetch-and-execute shell. This is the same shape every modern install CLI (helm, brew, AWS CLI) already uses.

## Command surface

```
pup install <component> <platform> [args]
```

Two dimensions:

- **Component** — what to install (`ssi`, `agent`, `dbm`, `llm-obs`, `csm`, …)
- **Platform** — where to install (`linux`, `k8s`, `docker`, `windows`, …)

Today, only `ssi linux` is wired up (and as scaffolding — see "Status" below). The matrix grows as future PRs add components/platforms.

### Examples

Today (after this scaffolding PR):

```
pup install ssi linux --host bastion.example.com --user ec2-user --key ~/.ssh/id_ed25519
pup install ssi linux --host bastion.example.com --dry-run
```

Future (out of scope for this PR):

```
pup install agent linux --host <X> # just the agent, no SSI
pup install ssi k8s --context <CTX> # SSI via helm + DatadogAgent CR
pup install dbm linux --host <X> --db pg # Database Monitoring
pup install llm-obs local --runtime python # LLM Obs SDK
```

## Linux SSI install flow

This is what `pup install ssi linux --host <X>` will do once implemented:

1. Open an SSH session to the target via the `russh` client (already an optional dep of pup; used today by the auth tunnel server).
2. Detect the host's package manager by reading `/etc/os-release` (`$ID` + `$ID_LIKE`).
3. Configure the Datadog package repo on the target:
- **Debian / Ubuntu**: write `/etc/apt/sources.list.d/datadog.list` (with `signed-by=`), download `keys.datadoghq.com/DATADOG_APT_KEY_*.public` keys, import via `gpg --no-default-keyring`.
- **RHEL / CentOS / Amazon / Rocky / Alma**: write `/etc/yum.repos.d/datadog.repo` referencing the same key URLs.
4. Install signed packages via the host's package manager: `datadog-agent` (+ `datadog-signing-keys` on apt).
5. Download the `datadog-installer` binary from the OCI registry (`gcr.io/datadoghq/installer-package/blobs/sha256:<pin>`), verify against a pinned SHA-256, then upload it to the remote host.
6. Execute `datadog-installer setup` on the remote host to install `apm-inject` + the language libraries and to write `/etc/ld.so.preload`. (The exact subcommand and flags — `--flavor "APM SSI"`, `--apm-instrumentation-enabled=host`, etc. — need verification with whoever owns `datadog-installer`. See "Open questions" below.)
7. Patch `/etc/datadog-agent/datadog.yaml` with `DD_API_KEY` and `DD_SITE` (sourced from pup's existing auth config).
8. Restart the agent and confirm `datadog-agent status` reports healthy.

The critical property: nowhere in this flow does pup execute a remote shell script. Every step is either (a) an `apt-get` / `yum` install against signed package repos, (b) an HTTP fetch of a key file or binary followed by signature/hash verification, or (c) executing a known binary with known arguments. The `curl | bash` pattern is gone.

## Security model

- **No fetch-and-execute.** No `curl | bash`, no `bash -c "$(curl ...)"`, no `wget | sh`. All downloads write to a file, then verify, then either parse or execute.
- **Signed packages.** Agent + SSI components install via apt/yum with the Datadog signing keys imported into a keyring; signature verification is enforced by the package manager.
- **Pinned binary.** The `datadog-installer` binary is downloaded by SHA-256 (pinned in pup's source) rather than "whatever the latest URL serves." This trades update friction for a real audit trail.
- **Auth surface.** SSH credentials are passed via flags / config (same shape as existing ssh tooling); Datadog API credentials use pup's existing `auth` infrastructure.
- **No new outbound endpoints beyond what the official install script already hits** — `apt.datadoghq.com`, `yum.datadoghq.com`, `keys.datadoghq.com`, `gcr.io` / `public.ecr.aws`.

## Status (as of this PR)

This is a **scaffolding PR**. What's in vs out:

**In:**
- The `pup install` command surface in `src/main.rs` (clap subcommands `Install` → `Ssi` → `Linux`).
- A new `src/commands/install.rs` module with the `ssi_linux` dispatcher.
- `--dry-run` prints the planned install steps for the given target.
- Without `--dry-run`, the command bails with an actionable error that points at this doc.
- Unit tests for both behaviors.

**Out (follow-up PRs):**
- The actual `russh` client session (connect, exec, file transfer).
- The OCI registry manifest fetch + blob download + SHA-256 verification.
- The repo file / keyring setup on the remote host.
- The `datadog-installer setup` invocation.
- The agent restart + health check.
- Integration tests against a real Linux host.
- Multi-distro test matrix (Debian 11/12, Ubuntu 22.04/24.04, RHEL 8/9, Amazon Linux 2/2023, Rocky 9).

The intent is to land the *surface* now so the pup team can review the shape, and to land the implementation in small, reviewable follow-ups.

## Open questions for the pup team

1. **Where should the install module live?** This PR puts everything in `src/commands/install.rs`. Once the implementation lands the install logic might want its own top-level `src/install/` module with submodules (`ssh.rs`, `oci.rs`, `linux_ssi.rs`). Happy to refactor at that point.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes, we should go with this approach. It will help keep this probably more sane to think about as long as they implement a trait

2. **SSH transport.** `russh` is already an optional dep but currently used only as a server (`src/tunnel.rs`). Should the install module reuse `russh` as a client, or shell out to the system `ssh` binary? `russh` keeps pup self-contained (especially on Windows); shelling out is simpler but adds a runtime dependency.

@platinummonkey platinummonkey Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more inclined to say shelling out is probably more appropriate here as long as the failure cases are very clear for the user/agent on what the dependencies are if they are missing/mis-configured.

3. **`datadog-installer` CLI surface.** The exact subcommand and flags used by `install-ssi.sh` (`setup --flavor "APM SSI"` per one read of the script) need to be confirmed with whoever owns that binary as a public, stable contract. If the contract isn't stable, we need a different integration point.
4. **OCI version pinning.** Who owns bumping the `datadog-installer` SHA-256 in pup when Datadog ships a new release? Manual on each agent release, or auto-bumped from a manifest?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually want the pinning there? Or should this be an API pup can query for the latest version of each installer? (that api could literally be github release info too)

5. **k8s parity.** `pup install ssi k8s` is the natural next leaf. The mechanics are entirely different (helm + DatadogAgent CR + admission webhook) — does it deserve its own command tree, or stay under `install ssi`?
6. **Auth integration.** Pup's existing `auth` config gives access to Datadog API. The install flow needs `DD_API_KEY` + `DD_SITE` to write `/etc/datadog-agent/datadog.yaml` — should the install command reuse `cfg.api_key` / `cfg.site`, or take its own flags?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be explicit and require these as flags. Considering the api_keys endpoints can be hooked up to pup, it could first ask the user to create a new key or to choose from an existing list if not explicitly specified

150 changes: 150 additions & 0 deletions src/commands/install.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
//! `pup install <component> <platform>` — unified install entry point.
//!
//! **Status: scaffolding.** The CLI surface and module structure land in this PR;
//! the actual install flow (russh client, OCI registry fetch, signed package
//! install, `datadog-installer setup`) ships in follow-up PRs.
//!
//! Motivation lives in `docs/INSTALL.md` — short version: skills currently
//! embed `bash -c "$(curl …install_script_agent7.sh)"` to install the agent +
//! SSI on remote hosts, which scanners (correctly) flag as RCE-shaped and which
//! references a script that lives outside any repo we control. Moving the
//! install flow into pup turns the SKILL.md into a one-liner (`pup install …`)
//! and shifts the install logic into compiled Rust that already gets reviewed.
//!
//! Today this returns a structured "not yet implemented" error that lists the
//! planned steps. `--dry-run` prints the same plan without erroring so callers
//! can wire up the surface end-to-end.

use anyhow::{bail, Result};

use crate::config::Config;

/// Install Single Step Instrumentation on a remote Linux host.
///
/// The `--dry-run` flag returns successfully after printing the install plan.
/// Without it, returns an error today — the actual install lands in a follow-up
/// PR per the checklist in `docs/INSTALL.md`.
pub async fn ssi_linux(
_cfg: &Config,
host: String,
user: String,
key: Option<String>,
port: u16,
dry_run: bool,
) -> Result<()> {
let plan = LinuxSsiPlan {
host: &host,
user: &user,
key: key.as_deref(),
port,
};

if dry_run {
plan.print();
return Ok(());
}

bail!(
"`pup install ssi linux` is scaffolded but not yet executable in this build. \
Re-run with --dry-run to see the planned steps for {user}@{host}:{port}, \
or track the follow-up implementation work in docs/INSTALL.md.",
)
}

/// Captured arguments for a single Linux-SSI install. Kept narrow on purpose —
/// the actual install code lives in follow-up PRs and will likely want a
/// richer config struct (e.g. proxy settings, sudo strategy, distro override).
struct LinuxSsiPlan<'a> {
host: &'a str,
user: &'a str,
key: Option<&'a str>,
port: u16,
}

impl LinuxSsiPlan<'_> {
fn print(&self) {
println!("# Planned `pup install ssi linux` steps");
println!("#");
println!(
"# Target: {user}@{host}:{port}{key}",
user = self.user,
host = self.host,
port = self.port,
key = match self.key {
Some(k) => format!(" (key: {k})"),
None => " (key: SSH agent / default identity)".to_string(),
}
);
println!("#");
println!("# 1. Open an SSH session via russh client.");
println!("# 2. Detect distro family by reading /etc/os-release ($ID + $ID_LIKE).");
println!("# 3. Configure Datadog package repo for the host:");
println!("# - Debian/Ubuntu → add apt repo signed by");
println!("# keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public");
println!("# - RHEL/CentOS/Amazon/Rocky/Alma → write /etc/yum.repos.d/datadog.repo");
println!("# 4. Install signed packages: datadog-agent (+ datadog-signing-keys on apt).");
println!("# 5. Download the `datadog-installer` binary from the OCI registry");
println!("# (gcr.io/datadoghq/installer-package), verify its SHA-256 against a");
println!("# pinned manifest, and execute it locally on the remote host.");
println!("# 6. Run `datadog-installer setup` to install apm-inject + language");
println!("# libraries and to write /etc/ld.so.preload.");
println!("# 7. Patch /etc/datadog-agent/datadog.yaml with DD_API_KEY + DD_SITE.");
println!("# 8. Restart the agent and confirm `datadog-agent status` is healthy.");
println!("#");
println!("# None of these steps run today — this command is scaffolding so the pup team");
println!("# can review the shape before the implementation lands. See docs/INSTALL.md.");
}
}

#[cfg(test)]
mod tests {
use super::*;
use crate::test_support::test_config;

#[tokio::test]
async fn dry_run_returns_ok_and_does_not_bail() {
// ssi_linux's scaffolding doesn't read from cfg yet, so we just need
// *any* Config to satisfy the signature.
let cfg = test_config("http://unused.invalid");
let result = ssi_linux(
&cfg,
"bastion.example.com".to_string(),
"ec2-user".to_string(),
Some("~/.ssh/id_ed25519".to_string()),
22,
/* dry_run */ true,
)
.await;
assert!(
result.is_ok(),
"dry-run should succeed; got {:?}",
result.err()
);
}

#[tokio::test]
async fn non_dry_run_bails_with_actionable_message() {
let cfg = test_config("http://unused.invalid");
let err = ssi_linux(
&cfg,
"host.example.com".to_string(),
"root".to_string(),
None,
22,
/* dry_run */ false,
)
.await
.expect_err("non-dry-run should bail until the real impl lands");
let msg = format!("{err:#}");
// The error message must tell the user what to do next, not just say
// "not implemented" with no context.
assert!(
msg.contains("--dry-run"),
"error should point at --dry-run as the working flow, got: {msg}"
);
assert!(
msg.contains("docs/INSTALL.md"),
"error should reference the design doc, got: {msg}"
);
}
}
2 changes: 2 additions & 0 deletions src/commands/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ pub mod hamr;
pub mod idp;
pub mod incidents;
pub mod infrastructure;
#[cfg(not(target_arch = "wasm32"))]
pub mod install;
pub mod integrations;
pub mod investigations;
pub mod kafka;
Expand Down
88 changes: 88 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1551,6 +1551,37 @@ enum Commands {
#[command(subcommand)]
action: InfraActions,
},
/// Install Datadog components on a target (host, cluster, …)
///
/// Install Datadog observability components on a target — the unified entry
/// point for the install flows historically delivered by curl-bash one-liners
/// (`install_script_agent7.sh`, `install-ssi.sh`, etc.).
///
/// COMMAND SHAPE:
/// pup install <component> <platform> [args]
///
/// <component> what to install: ssi, agent, dbm, llm-obs, …
/// <platform> where to install: linux, k8s, …
///
/// EXAMPLES (today):
/// # Install Datadog Agent + Single Step Instrumentation on a remote Linux host
/// pup install ssi linux --host bastion.example.com --user ec2-user --key ~/.ssh/id_ed25519
///
/// EXAMPLES (planned, follow-up PRs):
/// pup install agent linux --host <X> # just the agent
/// pup install ssi k8s --context <CTX> # SSI via helm + DatadogAgent CR
/// pup install dbm linux --host <X> --db pg # Database Monitoring
/// pup install llm-obs local --runtime python # LLM Obs SDK
///
/// STATUS: scaffolding. Command surface and module structure land in this PR;
/// the russh client + OCI registry client + installer-binary invocation are
/// follow-up work. See docs/INSTALL.md for the full plan.
#[cfg(not(target_arch = "wasm32"))]
#[command(verbatim_doc_comment)]
Install {
#[command(subcommand)]
action: InstallActions,
},
/// Manage third-party integrations
///
/// Manage third-party integrations with external services.
Expand Down Expand Up @@ -4431,6 +4462,48 @@ enum InfraHostActions {
Get { hostname: String },
}

// ---- Install: Datadog component install on a target (host, cluster, …) ----
//
// Shape: `pup install <component> <platform> [args]`. Scaffolding only in this
// PR — the working install flows ship in follow-up PRs. See docs/INSTALL.md.
#[cfg(not(target_arch = "wasm32"))]
#[derive(Subcommand)]
enum InstallActions {
/// Install Single Step Instrumentation (Datadog Agent + APM auto-injection)
Ssi {
#[command(subcommand)]
action: InstallSsiActions,
},
}

#[cfg(not(target_arch = "wasm32"))]
#[derive(Subcommand)]
enum InstallSsiActions {
/// Install SSI on a remote Linux host over SSH
///
/// Replaces the historical `bash -c "$(curl install_script_agent7.sh)"` flow
/// with a native Rust install: SSH client, signed package install (apt/yum),
/// SHA-256-verified OCI binary download for `datadog-installer`, then
/// `installer setup` to enable SSI. No remote shell scripts are executed.
Linux {
/// SSH host or `user@host` of the target machine
#[arg(long)]
host: String,
/// SSH user (default: root)
#[arg(long, default_value = "root")]
user: String,
/// Path to SSH private key
#[arg(long)]
key: Option<String>,
/// SSH port (default: 22)
#[arg(long, default_value_t = 22)]
port: u16,
/// Print the install plan without connecting or making changes
#[arg(long)]
dry_run: bool,
},
}

// ---- IDP (Internal Developer Portal) ----
#[derive(Subcommand)]
enum IdpActions {
Expand Down Expand Up @@ -11914,6 +11987,21 @@ async fn main_inner() -> anyhow::Result<()> {
},
}
}
// --- Install (Datadog component install on a target) ---
#[cfg(not(target_arch = "wasm32"))]
Commands::Install { action } => match action {
InstallActions::Ssi { action } => match action {
InstallSsiActions::Linux {
host,
user,
key,
port,
dry_run,
} => {
commands::install::ssi_linux(&cfg, host, user, key, port, dry_run).await?;
}
},
},
// --- IDP (Internal Developer Portal) ---
Commands::Idp { action } => {
cfg.validate_auth()?;
Expand Down
Loading