You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/automatic-ofed-upgrade.md
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -114,6 +114,21 @@ controller manager watchers:
114
114
> Meaning in case node undergoes upgrade prior to enabling `requestor` mode, node will continue `inplace` upgrade mode. Only after `requestor` mode is set, and upgrade
115
115
> controller has set nodes state to be upgrade-required, only then new requestor mode will take place.
116
116
117
+
###### shared-requestor
118
+
The requestor mode supports a `shared-requestor` flow where multiple operators can coordinate node maintenance operations:
119
+
Assumptions:
120
+
1. Cluster admin, which requires `shared-requestor` flow, needs to make sure that all operators, utilizing maintenance OP, use same upgrade policy specs (same drainSpec).
121
+
2. To be able to accommodate both GPU/Network drivers upgrade, `DrainSpec.PodSelector` should be set accordingly (hard-coded).
3. No custom `NodeMaintenanceNamePrefix` should be used. Requestor will use `DefaultNodeMaintenanceNamePrefix` as a common prefix for nodeMaintenance name.
124
+
Flow:
125
+
1. Each operator adds its dedicated operator label to the nodeMaintenance object
126
+
2. When a nodeMaintenance object exists, additional operators append their requestorID to the spec.AdditionalRequestors list
127
+
3. During `uncordon-required` completion:
128
+
- Non-owning operators remove themselves from spec.AdditionalRequestors list using optimistic locking
129
+
- Each operator removes its dedicated label from the nodeMaintenance object
130
+
4. The owning nodeMaintenance operator handles the actual, client side, deletion of the nodeMaintenance object
131
+
117
132
### Troubleshooting
118
133
#### Node is in `upgrade-failed` state
119
134
* Drain the node manually by running `kubectl drain <node_name> --ignore-daemonsets`
0 commit comments