Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsck causes page fault at "RIP: 0010:bch2_chardev_exit+0x6e4/0xac0 [bcachefs]" when compiled with clang 19 #761

Open
2xsaiko opened this issue Oct 10, 2024 · 7 comments

Comments

@2xsaiko
Copy link

2xsaiko commented Oct 10, 2024

Running fsck on a bcachefs file system results in a kernel oops when the kernel is compiled with clang. It seems to be only triggered by fsck, the file system can be successfully mounted manually.

The attached log was taken in initrd recovery mode, command 'bcachefs fsck /dev/nvme0n1p2'.

This does not happen when compiled with GCC 14.

I'm not sure if the trace in the log is helpful, so let me know if I can provide anything else.

Operating System: Gentoo Linux ~amd64
Kernel: 6.11.2-gentoo-dist (sys-kernel/gentoo-kernel-6.11.2), default config
CPU: Intel Core i7-13700F
Clang/LLVM 19.1.1
bcachefs-tools 1.11.0

dmesg log

@g2p
Copy link
Contributor

g2p commented Oct 10, 2024

This could be https://lore.kernel.org/linux-bcachefs/ZvV6X5FPBBW7CO1f@archlinux/T/#u
According to Kees Cook counted_by under LLVM should be disabled until you can target an LLVM version that fixes it.

@2xsaiko
Copy link
Author

2xsaiko commented Oct 10, 2024

Are you sure? The trace looks completely different. If it's the same bug I would at least expect __fortify_panic or __fortify_report in the call stack. Also theirs happens during mount, I can mount the file system fine.

I could try however, how would I disable that?

@g2p
Copy link
Contributor

g2p commented Oct 10, 2024

Sorry, I was only going off the fact that there was another recent LLVM-specific issue. I don't think there's a config to ignore the attribute yet. Another option mentioned in thread was to remove it manually from the source (various reverts mentioned by Kees and Thorsten Blum); or building LLVM main branch which has a partial fix; but this is heavy for just diagnosing the issue.

Could you pipe the backtrace through scripts/decode_stacktrace.sh?

@2xsaiko
Copy link
Author

2xsaiko commented Oct 10, 2024

This is a different log, I rebuilt the kernel in between so I took a new one. Still crashes in bch2_chardev_exit though.

However, it's giving the misleading "WARNING! Modules path isn't set, but is needed to parse this symbol", it actually does find the module file but for some reason it doesn't have a .debug_line section. Since the source location in the bcachefs module is probably what you're looking for, I'll have to investigate... the rest of the backtrace is there though.

decode.txt

@2xsaiko
Copy link
Author

2xsaiko commented Oct 10, 2024

Here you go. Was compiled with strip USE flag which I didn't see, oops.

decode.txt

@g2p
Copy link
Contributor

g2p commented Oct 17, 2024

Here's a patch that disables counted_by on current clang releases: https://lore.kernel.org/all/ZxB-uh1KzfD4ww2a@archlinux/

You might want to switch the condition to just

if __has_attribute(__counted_by__) && !defined(__clang__)

Since the clang fix isn't merged yet

@2xsaiko
Copy link
Author

2xsaiko commented Oct 18, 2024

Looks like that does fix it indeed. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants