Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions nexus/src/app/background/tasks/support_bundle/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Support Bundles

**Support Bundles** provide a mechanism for extracting information about a
running Oxide system, and giving operators control over the exfiltration of that
data.

This README is intended for developers trying to add data to the bundle.

## Step Execution Framework

Support Bundles are collected using **steps**, which are named functions acting
on the `BundleCollection` that can:

* Read from the database, or query arbitrary services
* Emit data to the output zipfile
* Produce additional follow-up **steps**, if necessary

If you're interested in adding data to a support bundle, you will probably be
adding data to an existing **step**, or creating a new one.

The set of all initial steps is defined in
`nexus/src/app/background/tasks/support_bundle/steps/mod.rs`, within a function
called `all()`. Some of these steps may themselves spawn additional steps,
such as `STEP_SPAWN_SLEDS`, which spawns a per-sled step to query the sled
host OS itself.

### Tracing

**Steps** are automatically instrumented, and their durations are emitted to an
output file in the bundle named `meta/trace.json`. These traces are in a format
which can be understood by **Perfetto**, a trace-viewer, and which provides
a browser-based interface at <https://ui.perfetto.dev/>.

## Filtering Bundle Contents

Support Bundles are collected by the `support_bundle_collector`
background task. They are collected as zipfiles within a single Nexus instance,
which are then transferred to durable storage.

The contents of a bundle may be controlled by modifying the **BundleRequest**
structure. This request provides filters for controlling the categories of
data which are collected (e.g., "Host OS info") as well as arguments for
more specific constraints (e.g., "Collect info from a specific Sled").

Bundle **steps** may query the `BundleRequest` to identify whether or not their
contents should be included.

## Overview for adding new data

* **Determine if your data should exist in a new step**. The existing set of
steps exists in `support_bundle/steps`. Adding a new step provides a new unit
of execution (it can be executed concurrently with other steps), and a unit of
tracing (it will be instrumented independently of other steps).
* If you're adding a new step...
* **Add it as a new module**, within `support_bundle/steps`.
* **Ensure it's part of `steps::all()`, or spawned by an existing step**. This
will be necessary for your step to be executed.
* **Provide a way for bundles to opt-out of collecting this data**. Check the
`BundleRequest` to see if your data exists in one of the current filters, or
consider adding a new one if your step involves a new category of data. Either
way, your new step should read `BundleRequest` to decide if it should trigger
before performing any subsequent operations.
* **Consider Caching**. If your new data requires performing any potentially
expensive operations which might be shared with other steps (e.g., reading
from the database, creating and using progenitor clients, etc) consider adding
that data to `support_bundle/cache`.
92 changes: 92 additions & 0 deletions nexus/src/app/background/tasks/support_bundle/cache.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

//! Cached data or clients which are collected by the bundle
//!
//! This is used to share data which may be used by multiple
//! otherwise independent steps.

use crate::app::background::tasks::support_bundle::collection::BundleCollection;

use gateway_client::Client as MgsClient;
use internal_dns_types::names::ServiceName;
use nexus_db_model::Sled;
use nexus_types::deployment::SledFilter;
use slog_error_chain::InlineErrorChain;
use std::sync::Arc;
use tokio::sync::OnceCell;

/// Caches information which can be derived from the BundleCollection.
///
/// This is exists as a small optimization for independent steps which may try
/// to read / access similar data, especially when it's fallible: we only need
/// to attempt to look it up once, and all steps can share it.
#[derive(Clone)]
pub struct Cache {
inner: Arc<Inner>,
}

struct Inner {
all_sleds: OnceCell<Option<Vec<Sled>>>,
mgs_client: OnceCell<Option<MgsClient>>,
}

impl Cache {
pub fn new() -> Self {
Self {
inner: Arc::new(Inner {
all_sleds: OnceCell::new(),
mgs_client: OnceCell::new(),
}),
}
}

pub async fn get_or_initialize_all_sleds<'a>(
&'a self,
collection: &BundleCollection,
) -> Option<&'a Vec<Sled>> {
self.inner
.all_sleds
.get_or_init(|| async {
collection
.datastore()
.sled_list_all_batched(
&collection.opctx(),
SledFilter::InService,
)
.await
.ok()
})
.await
.as_ref()
}

pub async fn get_or_initialize_mgs_client<'a>(
&'a self,
collection: &BundleCollection,
) -> Option<&'a MgsClient> {
self.inner
.mgs_client
.get_or_init(|| async { create_mgs_client(collection).await.ok() })
.await
.as_ref()
}
}

async fn create_mgs_client(
collection: &BundleCollection,
) -> anyhow::Result<MgsClient> {
let log = collection.log();
collection
.resolver()
.lookup_socket_v6(ServiceName::ManagementGatewayService)
.await
.map(|sockaddr| {
let url = format!("http://{}", sockaddr);
gateway_client::Client::new(&url, log.clone())
}).map_err(|e| {
error!(log, "failed to resolve MGS address"; "error" => InlineErrorChain::new(&e));
e.into()
})
}
Loading
Loading