-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for built-in ORC information #460
base: main
Are you sure you want to change the base?
Conversation
This will allow orc_version_from_header() to be reused for upcoming ORC integration that does not use libelf. Signed-off-by: Stephen Brennan <[email protected]>
So, the error here was that the 4.9 kernel does not support ORC, and I was passing through the lookup error for "num_orcs" in The interesting thing is that after correcting that, the test passes, because on 4.9, unwind can be done via frame pointers rather than ORC. So really, my stack tracing test isn't (necessarily) testing the ability to unwind with ORC, it's testing the ability to unwind with anything other than DWARF. I don't really have a knob to test this in the Python API. But unfortunately, this isn't really something easy to test in a C unit test either. |
I made a couple changes:
|
Interestingly, my test above that tested log messages failed on Python 3.6. I guess there is a difference in how the logging got initialized. I've added a context manager to explicitly set drgn's log level to DEBUG for the duration of the test, which I believe resolves the issue. Finally, I've gone ahead and added one more test, which actually does test vmlinux ORC. It creates a Program with no debuginfo, and then copies a With that, I do feel happy about the testing now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I perused the ORC changes and they looked sane overall, but I'll have to give them a closer look after the holidays. I looked over the Module.object
change more closely, so some comments there.
The Python 3.6 logging issues are probably due to our very sketchy integration with Python logging:
Lines 144 to 155 in a1869f9
// This is slightly heinous. We need to sync the Python logging configuration | |
// with libdrgn, but the Python log level and handlers can change at any time, | |
// and there are no APIs to be notified of this. | |
// | |
// To sync the log level, we monkey patch logger._cache.clear() to update the | |
// libdrgn log level on every live program. This only works since CPython commit | |
// 78c18a9b9a14 ("bpo-30962: Added caching to Logger.isEnabledFor() (GH-2752)") | |
// (in v3.7), though. Before that, the best we can do is sync the level at the | |
// time that the program is created. | |
// | |
// We also check handlers in that monkey patch, which isn't the right place to | |
// hook but should work in practice in most cases. |
Ahh, thank you! I knew about the loglevel monkey-patching, but I hadn't read the full comment, so I missed the bit about syncing the log level when the Program is created. That makes sense, and I can easily fix it. And yeah, I should have made more clear on this that I wasn't hoping for quick action before the holidays :) All the quick updates have been because I've been nerd-sniped by the tests, which are really fun to make work. |
At your leisure: How do you feel about the API choice of |
Stick with the absent object, I prefer that over |
Signed-off-by: Stephen Brennan <[email protected]>
This allows users to get the object which the module was created from. The primary use case is for Linux kernel modules, to return the "struct module" associated with the drgn module object. Signed-off-by: Stephen Brennan <[email protected]>
ORC has always been loaded from the ELF debug file. However, ORC is present in the memory pages of kernel core dumps, so it can still be used when the debug file is unavailable. Implement the ability to load built-in ORC for vmlinux and kernel modules. We still prefer to load ORC from the debug file wherever possible, because this is almost certainly faster. Signed-off-by: Stephen Brennan <[email protected]>
When looking up CFI rules using ORC, we use module->debug_file_bias unconditionally. This makes sense when the ORC is always loaded from an ELF debug file. However, now that built-in ORC can be loaded, it is possible that: 1. ORC is loaded from the built-in source, prior to loading the debug file. The module->debug_file_bias == 0, so the ORC is interpreted correctly. 2. Later, a debug file is loaded, updating debug_file_bias. However, the ORC hasn't been loaded from the debug file, so the bias is not applicable. 3. Future CFI lookups using ORC fail due to the extra bias. To avoid this, apply the debug_file_bias once to module->orc.pc_base, at the time we load the ORC sections out of the debug file. This ensures that the bias is only applied to the ORC data when we know we need it. Signed-off-by: Stephen Brennan <[email protected]>
Loading built-in ORC is a difficult functionality to test: it is best tested when there is no debuginfo file. Thus, we add two tests: one simpler test in which the kernel has debuginfo, but a module does not, and we must unwind a stack with functions from the module. The second test is more complex, where we create a program with no debuginfo at all, and provide it just enough data to initialize the module API and unwind with built-in ORC. In both cases, to verify that drgn is actually using ORC, we capture its log messages. Signed-off-by: Stephen Brennan <[email protected]>
There was a strange issue with Github CI (504 error fetching kernels). Assuming they pass this time, this is ready for review again at your convenience. Hope you had a great holiday! |
Thanks to your work today with 3ce0fee ("tests: don't clobber file in use by libelf") and a1869f9 ("Make StackFrame.name fall back to symbol/PC and add StackFrame.function_name"), this branch is fully unblocked!
Now that the module API has landed, we can usually rely on having a
struct drgn_module
available for a kernel module, even if the module debuginfo is not loaded (either because you have only loaded debuginfo for the kernel, or because you are using CTF). My previous ORC support required essentially re-implementing a small portion of the module API, specifically a tree that mapped address ranges to ORC data. Now, I can drop all that complexity and share this branch which I think is ready for consideration.This branch's main goal is to enable using built-in ORC information for stack unwinding, for the cases where a module does not have debuginfo. I think the commit messages here describe everything that's important. I think the testing is okay -- I wish I could test the vmlinux ORC, but I think I would need a type finder to allow the module API to even get initialized.