-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure Backup job is failing to unfreeze Ubuntu OS after running safefreeze #1868
Comments
we have the same issue, have been in conversation with MS AZure Backup Team for a few month already... nothing so far. All we know it's fsfreeze which cause the issue and VM goes bananas... the main symptom is avg high load just go crazy. |
We are experiencing the same condition on RHEL 8.10+ in MAG. The agent freezes the file systems but never thaws. This condition has occurred in our environment at least 10 times since July 2024. Based upon our review, something fails in between the freeze and the thaw and error handling doesn't trap. With the /var/log file system frozen, there's nothing written to any logs to identify exactly what the problem is and Microsoft support couldn't provide any input. Our only recourse is to deallocate and reallocate the VM. Redeploying does seem to reduce the occurrence but that's just an observation. We initially suspected this might be due to a swap file being placed on a temporary disk however the condition reoccurred after we excluded the /mnt file system (https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/create-swap-file-linux-vm#create-a-swap-partition - option #1). Lastly, our VM's that we've encountered this problem on (at least 5 unique) ran for at least months prior to occurring this condition for the first time. Sharing this if it helps anyone.. The last thing we see is "accepting signals". 2024/09/29 01:14:39.919079 Info PreSnapshot: Status Code: 200 |
Hello,
this is second time (of 3 times total) when we encountered an error with fsfreeze during Azure Backup. Our system is during the Azure Backup freezed but it's never unfreezed. The whole server is stuck and the only thing we can do is deallocate Azure VM and start it again.
When we checked logs we see that the last thing logged into the extension.log is run of a procedure safefreeze and proceed for accepting singal. Raw log:
2024-01-14 22:16:48.792020 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]PreSnapshot: Status Code: 200
2024-01-14 22:16:48.794220 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]Taking Snapshot through Host
2024-01-14 22:16:48.796446 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]T:S freeze, timeout value 60
2024-01-14 22:16:48.798717 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]skipped mount :
2024-01-14 22:16:48.800908 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]fsfreeze mount :/mnt
2024-01-14 22:16:48.803464 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]fsfreeze mount :/
2024-01-14 22:16:48.805675 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]skip freeze is : False
2024-01-14 22:16:48.811102 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]arg : ['/var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9207.0/main/safefreeze/bin/safefreeze', '60', '/']
2024-01-14 22:16:48.813571 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]proceeded for accepting signals
In Azure Backup job report we see the job as failed with these details:
ExtensionOperationInProgress
Command execution failed.
Another operation is in progress on this item. Please wait until the previous operation is completed.
We set everything correct like this documentation states:
https://learn.microsoft.com/en-us/azure/backup/backup-azure-linux-app-consistent
We are unable to identify the core of the issue deeper. We assume that it's a bug because the system is never unfreezed as it should be. We can't affect this behavior because it's initialized from Azure Backup agent job.
Our OS:
Ubuntu 18.04.6 LTS
VM agent version:
2.7.3.0
The text was updated successfully, but these errors were encountered: