Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 33 additions & 5 deletions pkg/daemon/update.go
Original file line number Diff line number Diff line change
Expand Up @@ -2557,12 +2557,17 @@ func (dn *Daemon) updateLayeredOS(config *mcfgv1.MachineConfig) error {
return dn.InplaceUpdateViaNewContainer(newURL)
}

isOsImagePresent := false
isPisConfigured, err := dn.isPinnedImageSetConfigured()
if err != nil {
// Ignore the error and default to remote pull
klog.Errorf("Failed to determine if pinned image set is configured: %v", err)
}

// not set during firstboot
if dn.fgHandler != nil && dn.fgHandler.Enabled(features.FeatureGatePinnedImages) {
isOsImagePresent, err = isImagePresent(newURL)
if err != nil {
// If PIS is configured check if the image is locally present. If so, rebase using
// the local image
isOsImagePresent := false
if isPisConfigured {
if isOsImagePresent, err = isImagePresent(newURL); err != nil {
return err
}
}
Expand Down Expand Up @@ -2595,6 +2600,29 @@ func (dn *Daemon) updateLayeredOS(config *mcfgv1.MachineConfig) error {
return nil
}

func (dn *Daemon) isPinnedImageSetConfigured() (bool, error) {
if dn.fgHandler == nil || !dn.fgHandler.Enabled(features.FeatureGatePinnedImages) || dn.node == nil || dn.mcpLister == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all versions that would need this fix has already GA'ed PIS, is the additional check here intended to allow for backporting to techpreview versions? Or just additional checks for consistency purposes (until such a time we remove the PIS featuregate entirely)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the exact same thing today. In >4.19.12, does it make sense to preserve the feature gate checks of PIS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it is for consistency since the feature gate is technically still defined in the API. Sure, right now we are only putting this fix into versions where PIS is GA, but what if, for some reason we can't see today, someone wants to backport this further and just cherrypicks this PR? It seems Joel plans to raise bugs when it is time to remove feature gates, so I figured until that time we would keep FG checks for consistency (ref Slack message).

// Two options:
// - PIS is not enabled
// - MCD first boot run: No connection to the cluster and node not populated -> Cannot check PIS config
return false, nil
}

// PIS is enabled. Check if it's configured in any of its pools
pools, _, err := helpers.GetPoolsForNode(dn.mcpLister, dn.node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, re-reading the PIS implementation, I found it interesting that we used GetPoolsForNode for general PIS detection (which I assume this is also doing for consistency), since it kind of implies there's some inheritance model in place (i.e. custom pools would count as workers for any PIS pulls, and master+worker nodes would also count as both) like MachineConfigs, which... might be intended? (if I understand correctly, I don't actually ever need any PIS for custom pools, unless for some reason they are using a different base image, but then they'd pull both custom and worker? Not sure if that's the intended workflow)

Not something we should worry about for this PR, just something I wanted to raise while we're here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprissed too, cause the initial drafy implementation I did used the primary pool but after reviewing what @isabella-janssen did I realized everything was using/considering all pools, unfortunately, that didn't trigger an alarm inside me and I changed the call to this one as it's what we use everywhere for PIS.
I think this conversation needs to be captured and properly handled, so I've opened a low prio bug. If the call to GetPoolsForNode everywhere in the PIS code is the right one, we could close the bug with no action.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a thread where this was discussed back in March: https://redhat-internal.slack.com/archives/C076EFZF40M/p1741706350105449. I think there is room to reconsider, especially given the concerns you raised in the bug @pablintino.

if err != nil {
return false, fmt.Errorf("failed to get pools for node %q: %w", dn.node.Name, err)
}

for _, pool := range pools {
if pool.Spec.PinnedImageSets != nil && len(pool.Spec.PinnedImageSets) > 0 {
return true, nil
}
}
// No pools with PIS configured
return false, nil
}

// Synchronously invoke a command, writing its stdout to our stdout,
// and gathering stderr into a buffer which will be returned in err
// in case of error.
Expand Down