-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't mount a disk with writeboost cache because of kernel crash on writeboost module. #235
Comments
btw, forgot to mention... the process |
From your information, there could be a contention in getting spinlock. This is not normal because there is no chance of such situation during the initialization. Q1. Before the power loss, have you experienced any successful reboot? Q2. Module version? Q3. Did you retried reboot? Btw, Done do this.
Remaining persistent cache on the caching device is normal. You shouldn't say "needs to be" because there is no actual need for this. |
I don't think the cache block is broken because Writeboost writes checksum along with the cache data. This will save us from power loss. struct segment_header_device {
/*
* We assume 1 sector write is atomic.
* This 1 sector region contains important information such as checksum
* of the rest of the segment data. We use 32bit checksum to audit if
* the segment is correctly written to the cache device.
*/
/* - FROM ------------------------------------ */
__le64 id;
__le32 checksum;
/*
* The number of metablocks in this segment header to be considered in
* log replay.
*/
__u8 length;
__u8 padding[512 - (8 + 4 + 1)]; /* 512B */
/* - TO -------------------------------------- */
struct metablock_device mbarr[0]; /* 16B * N */
} __packed; |
Really due to a power failure? This may be due to your system is broken. See #197 |
Yes, I could reboot without problem. Actually, it wasn't the first power loss I had on that machine, and on all other power losses it was able to mount the drive with the cache without a problem.
the latest git master branch head.
Yep. I retried reboot multiple times, without luck.
Actually, I kept trying fsck the disk without the cache, and at some point it did "fixed" the disk, but all data ended up in the "lost&found" folder. Although it was a bit of a pain, I was able to locate the files that I needed to save, and was able to "reconstruct" the root manually, by moving files out the the lost&found folder. So... it ended up ok! :)
what I meant was, to be able to fsck the disk without the cache (since I couldn't mount with the cache anymore, and because of that, I couldn't remove it from the disk), there was missing data on the physical disk that was "residing" in the cache. I understand it is normal to have persistent data on the cache, but in my situation where the cache couldn't be attached to the physical disk anymore, that data is essentially lost since now the logical block device can't access it, and the physical disk doesn't have it. Hence, fsck accused tons of errors.
well, it happen after I accidentally powered off the wifi smartplug where the machine is plugged to ac. So, I would say it did happen during an unexpected power failure?!? It may well be faulty memory, indeed, like #197... It's a bit difficult for me to run a memory check, since that machine is in a different continent than I. But at some point I'll ask someone there to run it for me and update it here! But the one thing I still am puzzled about is the fact I was running dm-writeboost on a Virtual Machine (KVM), and the "physical" disk was a virtio raw file on an ext4 filesystem running on a normal sata HD, and the ssd cache was also a virtio raw file on a ext4 filesystem running on a NVME disk. There maybe be some unknown interaction from the virtio/vfio KVM subsytem and dm-writeboost that could also be the reason? Or maybe data was lost by virtio/vfio when the power loss happened, before it could be delivered to dm-writeboost... anyhow, I'll hold on using dm-writeboost over virtio/vfio for now... thanks for the support, and sorry the delay to reply... after I retrieved the data I was so happy and I ended up forgetting to report it here! Won't happen again! |
So, I'me having trouble to mount a ext4 disk with a writeboost cache after a power loss.
The disk and cache are vfio scsi, inside a KVM virtual machine.
once the machine boots, dmwriteboost starts to "mount" the cache device on top of the disk, and a few minutes after I see this in dmesg:
this message keeps repeating for a while until the vm freezes.
I've tried to mount the disk without writeboost, but it won't mount. Even fsck can't fix it, probably because there's data on the cache that needs to be flushed to disk.
Is there any way to force the cache data to be dumped to disk (hence fixing the disk) in a situation like this?
Do you think upgrading the kernel (currently 4.19.0-18 debian) would help getting writeboost to finish the initialization without crashing?
thanks for any insights...
The text was updated successfully, but these errors were encountered: