Skip to content

Commit 3a384c0

Browse files
committed
[support bundles] Split support bundles into modules, add README for devs
1 parent 3873a57 commit 3a384c0

File tree

14 files changed

+2110
-1762
lines changed

14 files changed

+2110
-1762
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Support Bundles
2+
3+
**Support Bundles** provide a mechanism for extracting information about a
4+
running Oxide system, and giving operators control over the exfiltration of that
5+
data.
6+
7+
This README is intended for developers trying to add data to the bundle.
8+
9+
## Step Execution Framework
10+
11+
Support Bundles are collected using **steps**, which are named functions acting
12+
on the `BundleCollection` that can:
13+
14+
* Read from the database, or query arbitrary services
15+
* Emit data to the output zipfile
16+
* Produce additional follow-up **steps**, if necessary
17+
18+
If you're interested in adding data to a support bundle, you will probably be
19+
adding data to an existing **step**, or creating a new one.
20+
21+
The set of all initial steps is defined in
22+
`nexus/src/app/background/tasks/support_bundle/steps/mod.rs`, within a function
23+
called `all()`. Some of these steps may themselves spawn additional steps,
24+
such as `STEP_SPAWN_SLEDS`, which spawns a per-sled step to query the sled
25+
host OS itself.
26+
27+
### Tracing
28+
29+
**Steps** are automatically instrumented, and their durations are emitted to an
30+
output file in the bundle named `meta/trace.json`. These traces are in a format
31+
which can be understood by **Perfetto**, a trace-viewer, and which provides
32+
a browser-based interface at <https://ui.perfetto.dev/>.
33+
34+
## Filtering Bundle Contents
35+
36+
Support Bundles are collected by the `support_bundle_collector`
37+
background task. They are collected as zipfiles within a single Nexus instance,
38+
which are then transferred to durable storage.
39+
40+
The contents of a bundle may be controlled by modifying the **BundleRequest**
41+
structure. This request provides filters for controlling the categories of
42+
data which are collected (e.g., "Host OS info") as well as arguments for
43+
more specific constraints (e.g., "Collect info from a specific Sled").
44+
45+
Bundle **steps** may query the `BundleRequest` to identify whether or not their
46+
contents should be included.
47+
48+
## Overview for adding new data
49+
50+
* **Determine if your data should exist in a new step**. The existing set of
51+
steps exists in `support_bundle/steps`. Adding a new step provides a new unit
52+
of execution (it can be executed concurrently with other steps), and a unit of
53+
tracing (it will be instrumented independently of other steps).
54+
* If you're adding a new step...
55+
** **Add it as a new module**, within `support_bundle/steps`.
56+
** **Ensure it's part of `steps::all()`, or spawned by an existing step**. This
57+
will be necessary for your step to be executed.
58+
** **Provide a way for bundles to opt-out of collecting this data**. Check the
59+
`BundleRequest` to see if your data exists in one of the current filters, or
60+
consider adding a new one if your step involves a new category of data. Either
61+
way, your new step should read `BundleRequest` to decide if it should trigger
62+
before performing any subsequent operations.
63+
* **Consider Caching**. If your new data requires performing any potentially
64+
expensive operations which might be shared with other steps (e.g., reading
65+
from the database, creating and using progenitor clients, etc) consider adding
66+
that data to `support_bundle/cache`.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
// This Source Code Form is subject to the terms of the Mozilla Public
2+
// License, v. 2.0. If a copy of the MPL was not distributed with this
3+
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
4+
5+
//! Cached data or clients which are collected by the bundle
6+
//!
7+
//! This is used to share data which may be used by multiple
8+
//! otherwise independent steps.
9+
10+
use crate::app::background::tasks::support_bundle::collection::BundleCollection;
11+
12+
use gateway_client::Client as MgsClient;
13+
use internal_dns_types::names::ServiceName;
14+
use nexus_db_model::Sled;
15+
use nexus_types::deployment::SledFilter;
16+
use slog_error_chain::InlineErrorChain;
17+
use std::sync::Arc;
18+
use tokio::sync::OnceCell;
19+
20+
/// Caches information which can be derived from the BundleCollection.
21+
///
22+
/// This is exists as a small optimization for independent steps which may try
23+
/// to read / access similar data, especially when it's fallible: we only need
24+
/// to attempt to look it up once, and all steps can share it.
25+
#[derive(Clone)]
26+
pub struct Cache {
27+
inner: Arc<Inner>,
28+
}
29+
30+
struct Inner {
31+
all_sleds: OnceCell<Option<Vec<Sled>>>,
32+
mgs_client: OnceCell<Option<MgsClient>>,
33+
}
34+
35+
impl Cache {
36+
pub fn new() -> Self {
37+
Self {
38+
inner: Arc::new(Inner {
39+
all_sleds: OnceCell::new(),
40+
mgs_client: OnceCell::new(),
41+
}),
42+
}
43+
}
44+
45+
pub async fn get_or_initialize_all_sleds<'a>(
46+
&'a self,
47+
collection: &BundleCollection,
48+
) -> Option<&'a Vec<Sled>> {
49+
self.inner
50+
.all_sleds
51+
.get_or_init(|| async {
52+
collection
53+
.datastore()
54+
.sled_list_all_batched(
55+
&collection.opctx(),
56+
SledFilter::InService,
57+
)
58+
.await
59+
.ok()
60+
})
61+
.await
62+
.as_ref()
63+
}
64+
65+
pub async fn get_or_initialize_mgs_client<'a>(
66+
&'a self,
67+
collection: &BundleCollection,
68+
) -> Option<&'a MgsClient> {
69+
self.inner
70+
.mgs_client
71+
.get_or_init(|| async { create_mgs_client(collection).await.ok() })
72+
.await
73+
.as_ref()
74+
}
75+
}
76+
77+
async fn create_mgs_client(
78+
collection: &BundleCollection,
79+
) -> anyhow::Result<MgsClient> {
80+
let log = collection.log();
81+
collection
82+
.resolver()
83+
.lookup_socket_v6(ServiceName::ManagementGatewayService)
84+
.await
85+
.map(|sockaddr| {
86+
let url = format!("http://{}", sockaddr);
87+
gateway_client::Client::new(&url, log.clone())
88+
}).map_err(|e| {
89+
error!(log, "failed to resolve MGS address"; "error" => InlineErrorChain::new(&e));
90+
e.into()
91+
})
92+
}

0 commit comments

Comments
 (0)