[multicast] implicit group lifecycle with IP pool integration #9450

zeeshanlakhani · 2025-11-30T02:26:13Z

This PR also addresses permission models, object deletion, and error handling questions related to reserved addresses presented in @askfongjojo's testing Google Doc (default IP Pools are covered in a follow-up, stacked PR).

In thinking through the Groups API, permission scopes, and flexibility, @rcgoodfellow mentioned this consideration:

Do we need an explicit notion of a group object at all? Or can
instances simply allocate/deallocate group IPs from pools, and there is
no explicit management of group objects.

With Fleet admins having access control to create pools and link silos to a pool, we arrived at the idea of replacing the current explicit multicast group CRUD with an implicit lifecycle, where groups are created upon the first member join and deleted when the last member leaves.

Note: Most of the PR's changes are test-related due to moving away from the explicit multicast group(s) lifecycle.

Auth Model:

Discovery (fleet-scoped):
- Read/list groups and list members: any authenticated user in the same fleet.
Membership (project-scoped):
- Join/leave requires Instance::Modify on the specific instance.
Creation control:
- Implicit group creation only when the caller’s silo is linked to a suitable multicast pool (by name or by explicit IP in that pool).

Behavior:

Implicit lifecycle:
- Create on first join (idempotent); delete when last member leaves (atomic mark-for-removal, reconciler schedules cleanup).
Addressing and validation:
- Implicit allocation from the caller’s linked multicast pools.
- SSM/ASM semantics enforced:
  - IPv4 SSM 232/8 and IPv6 ff3x::/32 require ≥1 source IP.
  - ASM must not specify sources. - When joining by explicit IP: resolve the pool containing the IP, and we verify the silo link before creating.
Error handling: - Reserved/invalid multicast ranges rejected at pool/range add time.

API:

Primary flows: - Group-centric member management: POST/DELETE /v1/multicast-groups/{group}/members - Instance-centric join/leave: PUT/DELETE /v1/instances/{instance}/multicast-groups/{group}
Discovery endpoints remain for list/view; there is no explicit group create/update/delete.
This is a breaking change, but multicast is not yet enabled or available in production

Key changes:

Implicit group model; groups exist while they have members.
IP pool integration for multicast allocation with silo link gating.
Simplified API centered on join/leave flows.
Add multicast_ip to the member table for responses.
For consistency, move to Instant type over SystemTime for mcast-related caches

Follow‑ups (stacked [and related] PRs)

omdb additions for multicast introspection -> [multicast] add omdb mcast commands for groups, members, pools #9464
Remove MVLAN from group data model -> [multicast] Drop (m)vlan from multicast groups #9451
Default IP pool support (IPv4/IPv6 × unicast/multicast) -> [ip-pools] Allow multiple default IP pools per silo #9452 (stacked atop [multicast] Drop (m)vlan from multicast groups #9451 due to database migration conflict, but can be reviewed separately)
Dendrite: use omicron-common constants for validation -> [multicast] Narrow admin-scoped to admin-local only (ff04::/16) dendrite#162

@askfongjojo

This PR also addresses permission models, object deletion, and error handling questions related to reserved addresses presented in @askfongjojo's testing Google Doc (default IP Pools are covered in a follow-up, stacked PR). In thinking through the *Groups* API, permission scopes, and flexibility, @rcgoodfellow mentioned this consideration: > Do we need an explicit notion of a group object at all? Or can > instances simply allocate/deallocate group IPs from pools, and there is > no explicit management of group objects. With Fleet admins having access control to create pools and link silos to a pool, we arrived at the idea of replacing the current explicit multicast group CRUD with an implicit lifecycle, where groups are created upon the first member join and deleted when the last member leaves. **Note**: Most of the PR's changes are test-related due to moving away from the explicit multicast group(s) lifecycle. Auth Model: - Discovery (fleet-scoped): - Read/list groups and list members: any authenticated user in the same fleet. - Membership (project-scoped): - Join/leave requires Instance::Modify on the specific instance. - Creation control: - Implicit group creation only when the s silo is linked to a suitable multicast pool (by name or by explicit IP in that pool). Behavior: - Implicit lifecycle: - Create on first join (idempotent); delete when last member leaves (atomic mark-for-removal, reconciler schedules cleanup). - Addressing and validation: - Implicit allocation from the s linked multicast pools. - SSM/ASM semantics enforced: - IPv4 SSM 232/8 and IPv6 ff3x::/32 - Error handling: - Reserved/invalid multicast ranges rejected at pool/range add time. API: - Primary flows: - Group-centric member management: POST/DELETE /v1/multicast-groups/{group}/members - Instance-centric join/leave: PUT/DELETE /v1/instances/{instance}/multicast-groups/{group} - Discovery endpoints remain for list/view; there is no explicit group create/update/delete. - This is a *breaking* change, but multicast is not yet enabled or available in production Key changes: - Implicit group model; groups exist while they have members. - IP pool integration for multicast allocation with silo link gating. - Simplified API centered on join/leave flows. - Add multicast_ip to the member table for responses. - For consistency, move to `Instant` type over `SystemTime` for mcast-related caches Follow-ups (stacked PRs) - [ ] Remove MVLAN from group data model. - [ ] Default IP pool support (IPv4/IPv6 Followrequire unicast/multicast). - [ ] Dendrite: use omicron-common constants for validation.

Previously, internal multicast groups accepted admin-scoped addresses including admin-local (ff04), site-local (ff05), and org-local (ff08). This narrows the scope to only admin-local (ff04::/16), which is what Omicron *now* dictates. - [ ] This should be merged after oxidecomputer/omicron#9450 is reviewed and merged into Omicron. We now make Dendrite/Dpd match Omicron consistently for validation. Key changes: - Remove IPV6_SITE_LOCAL_PATTERN and IPV6_ORG_SCOPE_PATTERN from P4 - Update P4 table entries to only match admin-local (size 4→2) - Add ADMIN_LOCAL_PREFIX const to dpd-types with RFC doc links - Update validation to use `is_admin_local_multicast()` from oxnet v0.1.4 - Bump to API version 2 for doc changes (only) - Update README with OpenAPI generation instructions - Use new multicast subnet constants from `omicron-common` for validation

askfongjojo · 2025-12-02T02:40:43Z

@zeeshanlakhani - The revised authn model and implicit creation/deletion logic make sense to me. I'll make another pass tomorrow to ensure I didn't miss anything but the mental model of multicast group lifecycle seems straightforward enough.

…ecycle

…st groups

This PR adds omdb commands to inspect multicast state: - `omdb db multicast groups` - list multicast groups with optional state and pool name filters - `omdb db multicast members` - list group members with filters for group-id, group-name, group-ip, state, and sled-id - `omdb db multicast info` - show detailed info for a specific group - `omdb db multicast pools` - list multicast IP pools We also include: - Background task status display for multicast_reconciler - Integration tests for all multicast omdb commands Follows the multicast lifecycle work in #9450.

…ecycle

Introduce API version `VERSION_MULTICAST_IMPLICIT_LIFECYCLE_UPDATES` (v2025120500) to support the transition from explicit to implicit multicast group lifecycle management. Changes in new API version: - Groups are created implicitly when first member joins - Groups are deleted implicitly when last member leaves - Instance create/update accept `MulticastGroupIdentifier` (name, UUID, or multicast IP address) instead of just `NameOrId` - MulticastGroupMemberAdd now has optional `source_ips` for SSM Backward compatibility (v20251120): - Add `v20251120` module with compatibility types using `NameOrId` - Explicit group create/update/delete endpoints marked deprecated - Proper base64 validation for user_data via shared UserData serde helper Also includes: - Add version_policy to techport server for omdb compatibility

…ecycle

askfongjojo · 2025-12-08T20:05:21Z

nexus/src/app/ip_pool.rs

+/// - IPv4: Rejects link-local (224.0.0.0/24), GLOP (233.0.0.0/8),
+///   admin-scoped (239.0.0.0/8), specific reserved addresses (NTP, Cisco
+///   Auto-RP, PTP), and prevents ASM/SSM boundary spanning
+/// - IPv6: Rejects reserved-scope (ff00::/16), interface-local (ff01::/16),
+///   link-local (ff02::/16), and prevents ASM/SSM boundary spanning


The validation against ASM/SSM boundary spanning is not part of this PR but reading this comment causes me to wonder how to use IP pools with ASM vs SSM IP ranges. Let's say a silo has two distinct types of workloads that have different source IP filtering requirements, how would the operator configure the silo's default IP pool such that users have the control over getting an ASM vs SSM address for a new multicast group (since they can't specify the IP pool to use in a join request)?

We handle this in allocate_external_multicast_group:

// Determine if this is an SSM request (source_ips provided) or an // implicit ASM request (no sources, no explicit pool/IP) let sources_empty = params.source_ips.as_ref().map(|v| v.is_empty()).unwrap_or(true); let needs_ssm_pool = !sources_empty && params.pool.is_none() && params.ip.is_none(); let needs_asm_pool = sources_empty && params.pool.is_none() && params.ip.is_none(); // Select the appropriate pool: // - If `source_ips` provided without explicit pool/IP, find an SSM pool. // - If no `source_ips` and no explicit pool/IP, find an ASM pool. let authz_pool = if needs_ssm_pool { let (authz_pool, _) = self .ip_pools_fetch_ssm_multicast(opctx) .await ... } else if needs_asm_pool { let (authz_pool, _) = self .ip_pools_fetch_asm_multicast(opctx) .await ...

If the user provides source_ips in the join request body, we'll select an SSM pool automatically; if a user doesn't provide source_ips, we'll select an ASM pool
automatically.

The operator setup involves creating a multicast pool with SSM ranges (232.x.x.x for IPv4, ff3x::/32 for IPv6) and/or another multicast pool with ASM ranges (other
multicast addresses), then linking both to the silo. One can be marked as the default, but the SSM/ASM selection criteria scans all linked pools by their IP ranges.

It's really all about source IPs (or not).

Includes: - Remove GLOP (233/8), admin-scoped (239/8), and specific reserved address (NTP, Cisco Auto-RP, PTP) restrictions from IP pool validation - Only link-local multicast (224.0.0.0/24) is now rejected (not routable) - Add ASM pool fallback when join-by-name with source_ips finds no SSM pool linked - Allow source filtering on ASM addresses (IGMPv3/MLDv2 supports this) - SSM addresses still require sources per RFC 4607 The previous restrictions were overly conservative. Customers may have legitimate use cases for GLOP (AS-based allocations), admin-scoped (organization-local multicast), and protocol-specific addresses.

rcgoodfellow

Thanks @zeeshanlakhani for getting the implicit group changes coded up. Questions/comments follow.

rcgoodfellow · 2025-12-10T22:01:51Z

nexus/external-api/src/lib.rs

        tags = ["experimental"],
+        versions = VERSION_MULTICAST_IMPLICIT_LIFECYCLE_UPDATES..,
    }]
    async fn multicast_group_member_add(


Do we still want/need this API endpoint? What does this give us beyond the instance join endpoint?

I agree. Seems useless now. Will update.

rcgoodfellow · 2025-12-10T22:14:22Z

nexus/db-model/src/multicast_group.rs

+    /// (important given implicit create/delete semantics). Both external and
+    /// underlay groups use the same tag for pairing.
+    pub fn dpd_tag(&self) -> String {
+        self.id().to_string()


Having the multicast group address in the tag would probably be nice for someone looking at dpd state and trying to figure out where things are coming from and/or what this tag actually means.

rcgoodfellow · 2025-12-10T22:36:34Z

nexus/external-api/src/lib.rs

+        operation_id = "multicast_group_list",
+        versions = ..VERSION_MULTICAST_IMPLICIT_LIFECYCLE_UPDATES,
+    }]
+    async fn v2025120300_multicast_group_list(


Why does this need to exist? AFAICT it's identical in signature and implementation to multicast_group_list?

Actually, I was right, it's due to the Path type changes, as Path<T> from dropshot is a struct with a private inner field version. But, I was pointing to the wrong version.

I can clean this up a bit.

rcgoodfellow · 2025-12-10T22:45:25Z

nexus/external-api/src/lib.rs

        tags = ["experimental"],
+        versions = VERSION_MULTICAST_IMPLICIT_LIFECYCLE_UPDATES..,
    }]
    async fn multicast_group_member_remove(


Same question as above, do we need this when we have the instance leave endpoint?

rcgoodfellow · 2025-12-10T23:10:53Z

nexus/types/src/external_api/views.rs

    pub multicast_ip: IpAddr,
    /// Source IP addresses for Source-Specific Multicast (SSM).
    /// Empty array means any source is allowed.
    pub source_ips: Vec<IpAddr>,


I think SSM source IPs are more a property of an individual members join than of the group itself. Or is this the union of source addresses from all joins for a given group?

I mean for SSM, sources are typically tied to a group. (S, G); however, you make a good point. Sources should be specific to the member, and this should be a union of those.

zeeshanlakhani force-pushed the zl/mcast-implicit-lifecycle branch from e44bc27 to 1c9d172 Compare November 30, 2025 06:36

zeeshanlakhani requested review from FelixMcFelix, askfongjojo, internet-diglett and rcgoodfellow November 30, 2025 06:36

zeeshanlakhani mentioned this pull request Dec 1, 2025

[multicast] Narrow admin-scoped to admin-local only (ff04::/16) oxidecomputer/dendrite#162

Open

2 tasks

[follow-up] link gateway should be unicast default pool only

f7b87b4

zeeshanlakhani added 3 commits December 2, 2025 03:14

Merge remote-tracking branch 'origin/main' into zl/mcast-implicit-lif…

1e3b137

…ecycle

Merge remote-tracking branch 'origin/main' into zl/mcast-implicit-lif…

4b62ccb

…ecycle

[update] auto-select appropriate SSM/ASM pool when allocating multica…

655e602

…st groups

zeeshanlakhani mentioned this pull request Dec 3, 2025

[multicast] add omdb mcast commands for groups, members, pools #9464

Open

zeeshanlakhani added 3 commits December 5, 2025 03:30

[fix] auth ordering

7def773

Merge remote-tracking branch 'origin/main' into zl/mcast-implicit-lif…

f28aa9c

…ecycle

zeeshanlakhani requested review from sunshowers and removed request for sunshowers December 5, 2025 12:28

zeeshanlakhani mentioned this pull request Dec 5, 2025

[multicast] Drop (m)vlan from multicast groups #9451

Open

1 task

zeeshanlakhani added 4 commits December 6, 2025 00:31

Merge remote-tracking branch 'origin/main' into zl/mcast-implicit-lif…

7b673a5

…ecycle

[merge] external api clarity and udpates

c8242bc

[fmt] ..

bc472b3

[fix] minor updates and delegation for API

b12e8c1

askfongjojo reviewed Dec 8, 2025

View reviewed changes

zeeshanlakhani added 3 commits December 9, 2025 06:47

[dep] oxnet update

39a016a

[nit] expose vni

07a36a1

zeeshanlakhani self-assigned this Dec 10, 2025

zeeshanlakhani requested a review from askfongjojo December 10, 2025 15:01

rcgoodfellow reviewed Dec 10, 2025

View reviewed changes

[multicast] implicit group lifecycle with IP pool integration #9450

Are you sure you want to change the base?

[multicast] implicit group lifecycle with IP pool integration #9450

Uh oh!

Conversation

zeeshanlakhani commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

askfongjojo commented Dec 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcgoodfellow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zeeshanlakhani Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zeeshanlakhani commented Nov 30, 2025 •

edited

Loading

zeeshanlakhani Dec 11, 2025 •

edited

Loading