Skip to content

feat: Adding support for shared-requestor #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/automatic-ofed-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,21 @@ controller manager watchers:
> Meaning in case node undergoes upgrade prior to enabling `requestor` mode, node will continue `inplace` upgrade mode. Only after `requestor` mode is set, and upgrade
> controller has set nodes state to be upgrade-required, only then new requestor mode will take place.

###### shared-requestor
The requestor mode supports a `shared-requestor` flow where multiple operators can coordinate node maintenance operations:
Assumptions:
1. Cluster admin, which requires `shared-requestor` flow, needs to make sure that all operators, utilizing maintenance OP, use same upgrade policy specs (same drainSpec).
2. To be able to accommodate both GPU/Network drivers upgrade, `DrainSpec.PodSelector` should be set accordingly (hard-coded).
* podSelector: `nvidia.com/ofed-driver-upgrade-drain.skip!=true,nvidia.com/gpu-driver-upgrade-drain.skip!=true`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just making sure, drain pkg will treat this as an OR right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be treated as AND condition.
Check the following example (I have tested it)

1. pod "foo" - nvidia.com/ofed-driver-upgrade-drain.skip=true label
2. pod "bar" - nvidia.com/gpu-driver-upgrade-drain.skip=true label
3. pod "baz" - no label

Pod "foo" (has nvidia.com/ofed-driver-upgrade-drain.skip=true):
nvidia.com/ofed-driver-upgrade-drain.skip!=true → FALSE (because it equals true)
nvidia.com/gpu-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
Result: FALSE AND TRUE = FALSE

Pod "bar" (has nvidia.com/gpu-driver-upgrade-drain.skip=true):
nvidia.com/ofed-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
nvidia.com/gpu-driver-upgrade-drain.skip!=true → FALSE (because it equals true)
Result: TRUE AND FALSE = FALSE

Pod "baz" (no labels):
nvidia.com/ofed-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
nvidia.com/gpu-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
Result: TRUE AND TRUE = TRUE

3. No custom `NodeMaintenanceNamePrefix` should be used. Requestor will use `DefaultNodeMaintenanceNamePrefix` as a common prefix for nodeMaintenance name.
Flow:
1. Each operator adds its dedicated operator label to the nodeMaintenance object
2. When a nodeMaintenance object exists, additional operators append their requestorID to the spec.AdditionalRequestors list
3. During `uncordon-required` completion:
- Non-owning operators remove themselves from spec.AdditionalRequestors list using optimistic locking
- Each operator removes its dedicated label from the nodeMaintenance object
4. The owning nodeMaintenance operator handles the actual, client side, deletion of the nodeMaintenance object

### Troubleshooting
#### Node is in `upgrade-failed` state
* Drain the node manually by running `kubectl drain <node_name> --ignore-daemonsets`
Expand Down
53 changes: 35 additions & 18 deletions pkg/upgrade/common_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ type CommonUpgradeManagerImpl struct {
ValidationManager ValidationManager
SafeDriverLoadManager SafeDriverLoadManager

shouldSkipUpgradeDoneFunc func(requestorID string, nodeState *NodeUpgradeState) (bool, error)
// optional states
podDeletionStateEnabled bool
validationStateEnabled bool
Expand All @@ -104,6 +105,7 @@ func NewCommonUpgradeStateManager(
log logr.Logger,
k8sConfig *rest.Config,
scheme *runtime.Scheme,
skipFunc func(requestorID string, nodeState *NodeUpgradeState) (bool, error),
eventRecorder record.EventRecorder) (*CommonUpgradeManagerImpl, error) {
k8sClient, err := client.New(k8sConfig, client.Options{Scheme: scheme})
if err != nil {
Expand All @@ -117,16 +119,17 @@ func NewCommonUpgradeStateManager(

nodeUpgradeStateProvider := NewNodeUpgradeStateProvider(k8sClient, log, eventRecorder)
commonUpgrade := CommonUpgradeManagerImpl{
Log: log,
K8sClient: k8sClient,
K8sInterface: k8sInterface,
EventRecorder: eventRecorder,
DrainManager: NewDrainManager(k8sInterface, nodeUpgradeStateProvider, log, eventRecorder),
PodManager: NewPodManager(k8sInterface, nodeUpgradeStateProvider, log, nil, eventRecorder),
CordonManager: NewCordonManager(k8sInterface, log),
NodeUpgradeStateProvider: nodeUpgradeStateProvider,
ValidationManager: NewValidationManager(k8sInterface, log, eventRecorder, nodeUpgradeStateProvider, ""),
SafeDriverLoadManager: NewSafeDriverLoadManager(nodeUpgradeStateProvider, log),
Log: log,
K8sClient: k8sClient,
K8sInterface: k8sInterface,
EventRecorder: eventRecorder,
DrainManager: NewDrainManager(k8sInterface, nodeUpgradeStateProvider, log, eventRecorder),
PodManager: NewPodManager(k8sInterface, nodeUpgradeStateProvider, log, nil, eventRecorder),
CordonManager: NewCordonManager(k8sInterface, log),
NodeUpgradeStateProvider: nodeUpgradeStateProvider,
ValidationManager: NewValidationManager(k8sInterface, log, eventRecorder, nodeUpgradeStateProvider, ""),
SafeDriverLoadManager: NewSafeDriverLoadManager(nodeUpgradeStateProvider, log),
shouldSkipUpgradeDoneFunc: skipFunc,
}

return &commonUpgrade, nil
Expand Down Expand Up @@ -487,7 +490,7 @@ func (m *CommonUpgradeManagerImpl) ProcessPodRestartNodes(
}
if driverPodInSync {
if !m.IsValidationEnabled() {
err = m.updateNodeToUncordonOrDoneState(ctx, nodeState.Node)
err = m.updateNodeToUncordonOrDoneState(ctx, nodeState)
if err != nil {
return err
}
Expand Down Expand Up @@ -595,7 +598,7 @@ func (m *CommonUpgradeManagerImpl) ProcessValidationRequiredNodes(
continue
}

err = m.updateNodeToUncordonOrDoneState(ctx, node)
err = m.updateNodeToUncordonOrDoneState(ctx, nodeState)
if err != nil {
return err
}
Expand Down Expand Up @@ -670,16 +673,30 @@ func (m *CommonUpgradeManagerImpl) SkipNodeUpgrade(node *corev1.Node) bool {
// updateNodeToUncordonOrDoneState skips moving the node to the UncordonRequired state if the node
// was Unschedulable at the beginning of the upgrade so that the node remains in the same state as
// when the upgrade started. In addition, the annotation tracking this information is removed.
func (m *CommonUpgradeManagerImpl) updateNodeToUncordonOrDoneState(ctx context.Context, node *corev1.Node) error {
func (m *CommonUpgradeManagerImpl) updateNodeToUncordonOrDoneState(ctx context.Context,
nodeState *NodeUpgradeState) error {
var (
shouldSkip bool
err error
)
node := nodeState.Node
newUpgradeState := UpgradeStateUncordonRequired
annotationKey := GetUpgradeInitialStateAnnotationKey()
if _, ok := node.Annotations[annotationKey]; ok {
m.Log.V(consts.LogLevelInfo).Info("Node was Unschedulable at beginning of upgrade, skipping uncordon",
"node", node.Name)
newUpgradeState = UpgradeStateDone
if m.shouldSkipUpgradeDoneFunc != nil {
shouldSkip, err = m.shouldSkipUpgradeDoneFunc(GetRequestorOptsFromEnvs().MaintenanceOPRequestorID, nodeState)
if err != nil {
return err
}
}
if !shouldSkip {
if _, ok := node.Annotations[annotationKey]; ok {
m.Log.V(consts.LogLevelInfo).Info("Node was Unschedulable at beginning of upgrade, skipping uncordon",
"node", node.Name)
newUpgradeState = UpgradeStateDone
}
}

err := m.NodeUpgradeStateProvider.ChangeNodeUpgradeState(ctx, node, newUpgradeState)
err = m.NodeUpgradeStateProvider.ChangeNodeUpgradeState(ctx, node, newUpgradeState)
if err != nil {
m.Log.V(consts.LogLevelError).Error(
err, "Failed to change node upgrade state", "node", node.Name, "state", newUpgradeState)
Expand Down
9 changes: 8 additions & 1 deletion pkg/upgrade/consts.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ const (
UpgradeStateLabelKeyFmt = "nvidia.com/%s-driver-upgrade-state"
// UpgradeSkipNodeLabelKeyFmt is the format of the node label boolean key indicating to skip driver upgrade
UpgradeSkipNodeLabelKeyFmt = "nvidia.com/%s-driver-upgrade.skip"
// UpgradeSkipDrainDriverSelectorFmt is the format of the pod selector key indicating to skip driver
// in upgrade drain spec
UpgradeSkipDrainDriverSelectorFmt = "nvidia.com/%s-driver-upgrade-drain.skip"
// UpgradeWaitForSafeDriverLoadAnnotationKeyFmt is the format of the node annotation key indicating that
// the driver is waiting for safe load. Meaning node should be cordoned and workloads should be removed from the
// node before the driver can continue to load.
Expand All @@ -39,8 +42,12 @@ const (
// (used for orphaned pods)
// Setting this label will trigger setting upgrade state to upgrade-required
UpgradeRequestedAnnotationKeyFmt = "nvidia.com/%s-driver-upgrade-requested"
// UpgradeRequestorModeAnnotationKeyFmt
// UpgradeRequestorModeAnnotationKeyFmt is the format of the node annotation indicating requestor driver upgrade
// mode is used for underlying node
UpgradeRequestorModeAnnotationKeyFmt = "nvidia.com/%s-driver-upgrade-requestor-mode"
// UpgradeRequestorLabelKeyFmt is the format of the nodeMaintenance label marked by requestor after modifying
// the nodeMaintenance object
UpgradeRequestorLabelKeyFmt = "nvidia.com/%s-requestor"
// UpgradeStateUnknown Node has this state when the upgrade flow is disabled or the node hasn't been processed yet
UpgradeStateUnknown = ""
// UpgradeStateUpgradeRequired is set when the driver pod on the node is not up-to-date and required upgrade
Expand Down
4 changes: 4 additions & 0 deletions pkg/upgrade/upgrade_inplace.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ func (m *InplaceNodeStateManagerImpl) ProcessUncordonRequiredNodes(
if nodeState.NodeMaintenance != nil {
continue
}
// check if if node upgrade is handled by requestor mode, if so node uncordon will be performed by requestor flow
if _, exists := nodeState.Node.Annotations[GetUpgradeRequestorModeAnnotationKey()]; exists {
continue
}
err := m.CordonManager.Uncordon(ctx, nodeState.Node)
if err != nil {
m.Log.V(consts.LogLevelWarning).Error(
Expand Down
Loading