-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash consistency issue with truncate in data_csum mode #131
Comments
I spent some time digging into this issue and have a bit more information. First, I think the specific error output I reported above might stem from the same issues described in #126. However, I think there may be a separate issue here impacting crash consistency in truncate. In order to see this with the example program described above, I added print statements to
i.e., the truncate has at least partially gone through because the size of the file has been updated. As shown above, Unfortunately, I don't have a fix or workaround for this issue; I think it could be pretty tricky to fix. The operations that would need to become atomic span multiple function calls (they start in |
Thanks for the report. Is there an easy way to reproduce it? The program you use, etc. Is the |
I went back and tested it out and you don't need Here is the program I am using in step 3 to make this bug manifest: test4.zip. It creates a file called file0 on NOVA, and trying to read it after following steps 1-3 should give the checksum verification error. |
I cannot reproduce the error with test4.cpp. After umount and mount, cat the file shows "a", without errors in dmesg. Is this related to VM setup? Can you reproduce the issue on a bare-metal machine? |
I looked into this a bit more (although it still needs some more investigation). Like in #126, I was not able to reproduce the issue on baremetal and I was able to resolve it on a VM by using QEMU's |
Thanks. I have seen that people encountering issues with NOVA on VM, and some flags help to workaround. I am not sure what the issue really is, but if you find out I am happy to apply. |
Hi Andiry,
I believe I've found a crash consistency issue with the
truncate()
system call in NOVA's data_csum mode. It can be replicated using the following steps:goto out;
around line 1330 of inode.c (right afternova_handle_setattr_operation()
innova_notify_change()
). This emulates a crash that preventsnova_setsize()
from running.data_csum=1
and mount it at /mnt/pmemdd
to copy the contents of the PM device to a separate filedd
to load the contents of the separate file back onto the PM devicecat /mnt/pmem/foo
The attempt to read foo gives an input/output error and NOVA outputs the following error logs:
As far as I have been able to tell, this issue seems to occur if we crash at any point after updating the tail pointer in
nova_update_inode()
(called bynova_handle_setattr_operation()
) and before handling checksums innova_update_truncated_block_csum()
. I don't know the exact root cause or have a fix for this, but I'll take another look when I get a chance.Thanks!
The text was updated successfully, but these errors were encountered: