-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Essential checks introduction for hooks and groups in recipe #1795
Conversation
0f931ff
to
c91198f
Compare
c91198f
to
7f4602b
Compare
d94908e
to
8ef861a
Compare
8ef861a
to
29fadac
Compare
cf4362b
to
ec06119
Compare
// to determine if the cluster data is protected. If it is nil, then it is | ||
// considered as success. So we should set it as false here and set it as | ||
// true if protection is not required or if protection is successful. | ||
v.kubeObjectsCaptureFailed("KubeObjectsCaptureNotStarted", "Kube objects capture has not started") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is poorly named for this purpose. you should probably call directly kubeObjectsCaptureStatusFalse
instead.
|
||
if isEssentialStep { | ||
// shows that at least one essential step has succeeded | ||
allEssentialStepsFailed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is disabled for all captureSteps
. In other words, if isEssentialSteps
is always true
, line 295 seems redundant. I'm not sure what the code is trying to determine.
@@ -1198,7 +1204,7 @@ func (v *VRGInstance) processAsPrimary() ctrl.Result { | |||
v.log.Info("Kube objects restore failed", "error", err) | |||
v.errorConditionLogAndSet(err, "Failed to restore kube objects", setVRGKubeObjectsErrorCondition) | |||
|
|||
return v.updateVRGStatus(v.result) | |||
return v.updateVRGConditionsAndStatus(v.result) | |||
} | |||
|
|||
v.log.Info("Kube objects restored") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the shouldRestoreClusterData
pattern and update status here, before progressing with further processing (which includes a new backup and VRG upload conditionally).
As e2e reports an error where restore of k8s resources is triggered twice, at this point returning to update status to ensure no further processing is done would be better as suggested by @raghavendra-talur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditional LGTM based on addressing the early return as mentioned.
9f32517
to
924cf98
Compare
I don't understand this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit is big and complicated and I don't fully understand it.
The second commit is smaller but I could not find the change described by the commit message. It looks like bunch of unrelated changes.
@@ -232,7 +232,7 @@ func backupDummyStatusProcessAndRestore( | |||
restoreName string, | |||
labels map[string]string, | |||
) (*velero.Restore, error) { | |||
backupStatusLog(backup, w.log) | |||
backupStatusLog("restore", backup, w.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to "initialize kubeObjectProtected condition to false"?
internal/controller/vrg_recipe.go
Outdated
@@ -47,6 +49,7 @@ func captureWorkflowDefault(vrg ramen.VolumeReplicationGroup, ramenConfig ramen. | |||
|
|||
captureSpecs := []kubeobjects.CaptureSpec{ | |||
{ | |||
Name: DefaultCaptureRecoverGroupName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to "initialize kubeObjectProtected condition to false"?
@@ -210,52 +210,106 @@ func (v *VRGInstance) kubeObjectsCaptureStartOrResume( | |||
captureNumber int64, | |||
pathName, capturePathName, namePrefix, veleroNamespaceName string, | |||
interval time.Duration, | |||
labels map[string]string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
We seems to create new labels here, but it is not clear way this change is needed, and how it is related to supporting the new essential checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not related to essential checks but during the changes found that this refactoring would help reduce the arguments required as labels can be easily retrieved using util package using vrg instance.
allEssentialStepsFailed, err := v.executeCaptureSteps(result, pathName, capturePathName, namePrefix, | ||
veleroNamespaceName, captureInProgressStatusUpdate, annotations, requests, log) | ||
if err != nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return silently on errors? is the error already logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of error, vrg status is updated. If you suggest additional log, will add.
annotations := map[string]string{vrgGenerationKey: strconv.FormatInt(generation, vrgGenerationNumberBase)} | ||
labels := util.OwnerLabels(v.instance) | ||
|
||
allEssentialStepsFailed, err := v.executeCaptureSteps(result, pathName, capturePathName, namePrefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do care about all essential steps failure and not about any essential failure? If something is essential than a failure should not be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentiality of steps (group/hook) depends on the workflow.FailOn and is expected work in combination.
If FailOn is "any-error", we don't care whether step is essential or not and fail immediately.
If FailOn is "essential-error", we look for essential step and if it fails, workflow is marked as failure. If all essential steps pass, workflow is treated as successful.
If FailOn is "full-error", we look for essential steps. Here, only if all the essential steps fail, workflow is treated as failure. In case any one essential passes, workflow is treated as successful.
result, cg, pathName, capturePathName, namePrefix, veleroNamespaceName, | ||
captureInProgressStatusUpdate, | ||
labels, annotations, requests, log, | ||
) | ||
requestsCompletedCount += loopCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loopCount is the number of requests completed by the call to v.kubeObjectsGroupCapture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
|
||
return | ||
return false, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return false here in all case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned here, in case failOn is "any-error", we should immediately stop the execution of the workflow.
924cf98
to
b9b560e
Compare
a842042
to
9ee584b
Compare
Signed-off-by: Annaraya Narasagond <[email protected]>
9ee584b
to
a0658f1
Compare
When using recipe, hooks and group can be either essential or not. It is defined either as group.Essential or hook.Essential. It is used in combination with workflow.FailOn.
The supported values for FailOn are:
Fixes #1762
Fixes #1760