You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently upgraded to a gvisor version containing extra validation of the container spec used when restoring checkpoints (i.e. all the logic here). Unfortunately, at the same time, we also discovered that we were inadvertently varying the container spec between checkpoint and restore in lots of subtle ways that, although technically incorrect, weren't causing any obvious bugs in our application. For example, varying the arguments to the container's init process between checkpoint and restore doesn't actually cause anything to break, because we consume the arguments into memory just once, before the checkpoint occurs. The new validation logic turns those small variances in the container spec into big failures that cause production issues.
Is it reasonable to add an option to disable the container spec validation? I'm fully aware that varying the container spec between checkpoint and restore is technically wrong, and I intend to fix it -- but in the interim, it would be nice to disable the validation to maintain stability.
Is this feature related to a specific bug?
No response
Do you have a specific solution in mind?
No response
The text was updated successfully, but these errors were encountered:
Seems fine to me so long as it's flag-gated behind a scary-sounding flag, and that the behavior for the now-not-validated fields is consistent (will they all effectively inherit the pre-checkpoint values?)
@EtiennePerot yes my understanding is they'll effectively inherit the pre-checkpoint flag. I think it makes sense to put this behind a scary flag and to still emit a warning or error-level log so we can alert on validation errors.
I've created #11323 for this change - let me know if it's acceptable!
For the now-not-validated fields, I believe two things may happen. For fields that don't have any semantic effect at restore time, gvisor will behave according to the values given at checkpoint time. For other fields, differences in the checkpoint/restore specs will still cause loud restore failures in subcomponents of the kernel (e.g. a missing bind mount will cause filesystem restoration to fail).
Description
We recently upgraded to a gvisor version containing extra validation of the container spec used when restoring checkpoints (i.e. all the logic here). Unfortunately, at the same time, we also discovered that we were inadvertently varying the container spec between checkpoint and restore in lots of subtle ways that, although technically incorrect, weren't causing any obvious bugs in our application. For example, varying the arguments to the container's init process between checkpoint and restore doesn't actually cause anything to break, because we consume the arguments into memory just once, before the checkpoint occurs. The new validation logic turns those small variances in the container spec into big failures that cause production issues.
Is it reasonable to add an option to disable the container spec validation? I'm fully aware that varying the container spec between checkpoint and restore is technically wrong, and I intend to fix it -- but in the interim, it would be nice to disable the validation to maintain stability.
Is this feature related to a specific bug?
No response
Do you have a specific solution in mind?
No response
The text was updated successfully, but these errors were encountered: