Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

register_ftrace_function() returns 0 on ftrace_bug() #385

Open
cormander opened this issue Aug 31, 2014 · 18 comments
Open

register_ftrace_function() returns 0 on ftrace_bug() #385

cormander opened this issue Aug 31, 2014 · 18 comments
Assignees

Comments

@cormander
Copy link
Contributor

Running kpatch on CentOS EL7, 3.10.0-123.el7.x86_64.

I'm forcing error conditions to test error handling of the module. Here is a case where register_ftrace_function() returns 0 even though ftrace_bug() is called from the kernel during it. Not sure if there is anything we can do about it in this module, but I thought I out to report it anyway.

[ 51.202423] ------------[ cut here ]------------
[ 51.202428] WARNING: at kernel/trace/ftrace.c:1663 ftrace_bug+0x25d/0x270()
[ 51.202429] Modules linked in: kpatch_tpe(OF+) tpe(OF) kpatch(OF) ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw ppdev gf128mul glue_helper ablk_helper cryptd vmw_balloon serio_raw pcspkr parport_pc i2c_piix4 parport vmw_vmci shpchp mperf nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi mptspi
[ 51.202446] scsi_transport_spi ahci vmwgfx ttm mptscsih libahci e1000 mptbase drm ata_piix libata i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kpatch_tpe]
[ 51.202452] CPU: 3 PID: 3041 Comm: insmod Tainted: GF U O-------------- 3.10.0-123.el7.x86_64 #1
[ 51.202453] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 51.202454] 0000000000000000 00000000c5f584bc ffff88005eb6fb08 ffffffff815e19ba
[ 51.202456] ffff88005eb6fb40 ffffffff8105dee1 ffff88007c0635e0 ffffffff812515a0
[ 51.202457] 000000000000235e ffffffff81d48110 ffffffff815f1cd0 ffff88005eb6fb50
[ 51.202458] Call Trace:
[ 51.202461] [] dump_stack+0x19/0x1b
[ 51.202463] [] warn_slowpath_common+0x61/0x80
[ 51.202465] [] ? security_bprm_set_creds+0x20/0x20
[ 51.202466] [] ? fentry+0x10/0x10
[ 51.202468] [] warn_slowpath_null+0x1a/0x20
[ 51.202469] [] ftrace_bug+0x25d/0x270
[ 51.202470] [] ftrace_replace_code+0x2b9/0x420
[ 51.202472] [] ? security_bprm_set_creds+0x20/0x20
[ 51.202473] [] ftrace_modify_all_code+0x62/0x80
[ 51.202474] [] arch_ftrace_update_code+0x10/0x20
[ 51.202476] [] ftrace_run_update_code+0x1e/0x80
[ 51.202477] [] ftrace_startup_enable+0x39/0x50
[ 51.202479] [] ftrace_startup+0xc1/0x270
[ 51.202480] [] register_ftrace_function+0x43/0x60
[ 51.202482] [] kpatch_link_object+0x2d0/0x560 [kpatch]
[ 51.202484] [] ? 0xffffffffa0502fff
[ 51.202486] [] kpatch_register+0x9b/0x4f0 [kpatch]
[ 51.202487] [] ? __kmalloc+0x1f3/0x230
[ 51.202489] [] patch_init+0x37a/0x1000 [kpatch_tpe]
[ 51.202491] [] ? 0xffffffffa050efff
[ 51.202492] [] do_one_initcall+0xe2/0x190
[ 51.202494] [] load_module+0x129b/0x1a90
[ 51.202495] [] ? ddebug_proc_write+0xf0/0xf0
[ 51.202497] [] ? copy_module_from_fd.isra.43+0x53/0x150
[ 51.202498] [] SyS_finit_module+0xa6/0xd0
[ 51.202500] [] system_call_fastpath+0x16/0x1b
[ 51.202501] ---[ end trace 7f62e2edcab0ad39 ]---
[ 51.202501] ftrace failed to modify [] security_bprm_check+0x0/0x30
[ 51.202503] actual: e9:db:71:2b:1f
[ 51.202505] Failed on adding breakpoints (9054):
[ 51.207685] kpatch: setting reg to true for 'security_bprm_check', register_ftrace_function returned 0
[ 51.208492] kpatch: can't set ftrace filter at address 0xffffffff81251d90
[ 51.208494] kpatch: can't unregister ftrace handler

Following this are two more calls to WARN for kpatch_unlink_object().

When kpatch then gets unloaded, it WARNs on kpatch_exit(), and successive attempts to load patch modules after kpatch is reloaded fail. A reboot is required to fix the problem.

Here is what I think is the relevant portion of the above stack trace:

[ 51.202428] WARNING: at kernel/trace/ftrace.c:1663 ftrace_bug+0x25d/0x270()
[ 51.202469] [] ftrace_bug+0x25d/0x270
[ 51.202470] [] ftrace_replace_code+0x2b9/0x420
[ 51.202480] [] register_ftrace_function+0x43/0x60
[ 51.202482] [] kpatch_link_object+0x2d0/0x560 [kpatch]
[ 51.202501] ftrace failed to modify [] security_bprm_check+0x0/0x30
[ 51.202505] Failed on adding breakpoints (9054):
[ 51.207685] kpatch: kpatch_ftrace_add_func finished for 'security_bprm_check', register_ftrace_function returned 0
[ 51.208492] kpatch: can't set ftrace filter at address 0xffffffff81251d90
[ 51.208494] kpatch: can't unregister ftrace handler

I added the printk to get this: "kpatch_ftrace_add_func finished for 'security_bprm_check', register_ftrace_function returned 0". So even though we ran into a failure condition, the code flow is proceeding, running into an error further down, and thus running into the additional errors for kpatch_unlink_object().

Any ideas? Perhaps the "real" fix for this is to push a patch upstream to fix register_ftrace_function() so it properly returns an error when it descends into ftrace_bug() madness? Or, in addition, is there a way to detect that this bug happened, or that the function isn't really registered, and properly divert the code flow to the error handling?

@cormander
Copy link
Contributor Author

So the ftrace_disabled symbol gets set on a call to ftrace_bug(). We can check this in places to prevent unneeded verbose errors, and tell the user what the real issue is. The following commit is a POC for that;

4f7c587

Let me know if you have any questions.

@cormander
Copy link
Contributor Author

The next commit f26de22 is cleaner.

@jpoimboe
Copy link
Member

jpoimboe commented Sep 2, 2014

Perhaps the "real" fix for this is to push a patch upstream to fix register_ftrace_function() so it properly returns an error when it descends into ftrace_bug() madness?

Yeah, it seems to me that changing register_ftrace_function() to return an error is the "real" fix. I'll talk to Steve Rostedt (ftrace maintainer) about it.

I think you forced this error, but otherwise I think this error scenario would be very rare, right?

@cormander
Copy link
Contributor Author

I didn't force the error intentionally. I ran into it while loading a kpatch that crossed the streams with tpe-lkm. We don't this this condition if #384 is accepted, but I thought, I can put in a safety for it, so why not?

My main deal is - while the condition might be rare, this adds a saner flow to handling the error.

@jpoimboe
Copy link
Member

jpoimboe commented Sep 2, 2014

I'd much rather have ftrace handle its own errors, so we don't have to have knowledge of ftrace internals (which are subject to change at any time) in kpatch.

@cormander
Copy link
Contributor Author

Totally understand. Kernel internals change a lot :)

Part of my commit is a check for ftrace_enabled (see #387 ) which is an exported sysctl entry that shouldn't ever change. Thoughts on still checking that? I believe it gets set to zero if ftrace_bug was called. Maybe not keep it as is - there may be a better way to check for a sysctl flag from within a kernel module.

@jpoimboe
Copy link
Member

jpoimboe commented Sep 2, 2014

True, ftrace_enabled is unlikely to ever change. But it's still not exported to kernel modules, and accessing it via kallsyms is pretty hackish (but the only way I know of). What's the advantage of kpatch checking ftrace_enabled vs having ftrace properly return an error?

@cormander
Copy link
Contributor Author

The advantage of checking ftrace_enabled is outlined in #387

Essentially, if its off, things don't work but the user would otherwise have no indication of that.

@jpoimboe
Copy link
Member

jpoimboe commented Sep 2, 2014

What I'm trying to ask is, assuming register_ftrace_function() can be changed to return an error if ftrace_enabled is 0, wouldn't that be better than checking ftrace_enabled in kpatch?

@cormander
Copy link
Contributor Author

If the upstream checks both the ftrace_ enabled/disabled flags and returns a proper error, then yes that should solve both of these problems.

@jpoimboe
Copy link
Member

jpoimboe commented Sep 2, 2014

Ok. I think Steve's on vacation but I'll ask him about it when he gets back. I might work up some ftrace patches to fix these issues if I get the chance.

@jpoimboe
Copy link
Member

jpoimboe commented Sep 2, 2014

To summarize, we need ftrace to return errors on the following conditions:

  • !ftrace_enabled
  • ftrace_disabled
  • ftrace_bug
  • bad first instruction

@cormander
Copy link
Contributor Author

Looks good, thank you!

@jpoimboe
Copy link
Member

jpoimboe commented Sep 2, 2014

@cormander No problem, thanks for all your help!

@jpoimboe
Copy link
Member

jpoimboe commented Sep 9, 2014

Unfortunately Steven Rostedt injured his back and will be away from his computer for a while, so this issue might be on hold for a little bit.

@cormander
Copy link
Contributor Author

Unfortunately Steven Rostedt injured his back and will be away from his computer for a while

Sorry to hear that. Hope he heals well.
so this issue might be on hold for a little bit.

Hmm, I wonder if we can ftrace the ftrace functions? ;)

It's not a big deal that this will take some time - like we said, rare that it'll happen in the wild. As for my TPE kernel module, I can add a note to the readme that it's incompatible with ftrace.=

@joe-lawrence
Copy link
Contributor

Old bug housekeeping... hopefully Steve's back is better by now :)

The original behavior noted in this issue remains true today: register_ftrace_function() doesn't return with an error if ftrace_enabled=0.

Attached is a hacked-up reproducer: kpatch-385.zip

% uname -r
4.20.0-rc5+

% ./test.sh
BUILD: ftrace-test... 
BUILD: livepatch-sample... 

TEST: (sanity check) basic function patching ... ok

TEST: ftrace_disabled=1...
insmod: ERROR: could not insert module livepatch-sample/livepatch-sample.ko: No such device
[  120.665892] livepatch: enabling patch 'livepatch_sample'
[  120.667044] livepatch: failed to set ftrace filter for function 'cmdline_proc_show' (-19)
[  120.668559] livepatch: failed to patch object 'vmlinux'
[  120.669555] livepatch: failed to enable patch 'livepatch_sample'
[  120.670742] livepatch: 'livepatch_sample': unpatching complete

TEST: ftrace_enabled=0...
[  128.708868] livepatch: enabling patch 'livepatch_sample'
[  128.710469] livepatch: 'livepatch_sample': starting patching transition
[  130.122106] livepatch: 'livepatch_sample': patching complete

TEST: ftrace_bug()...
insmod: ERROR: could not insert module livepatch-sample/livepatch-sample.ko: No such device
[  140.733149] WARNING: CPU: 0 PID: 3397 at kernel/trace/ftrace.c:2045 ftrace_bug+0x36/0x2b0
[  140.734323] Modules linked in: ftrace_test(OE) nls_utf8 isofs bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ppdev sg parport_pc i2c_piix4 joydev parport pcspkr sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi floppy ata_piix serio_raw libata [last unloaded: livepatch_sample]
[  140.738324] CPU: 0 PID: 3397 Comm: test.sh Tainted: G           OE K   4.20.0-rc5+ #3
[  140.739753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
[  140.741918] RIP: 0010:ftrace_bug+0x36/0x2b0
[  140.742696] Code: 48 89 f3 0f 84 d0 01 00 00 4c 8b 2e 4d 89 ec 83 ff f2 0f 84 6e 01 00 00 83 ff ff 0f 84 38 01 00 00 83 ff ea 0f 84 84 00 00 00 <0f> 0b c7 05 16 b8 26 01 01 00 00 00 c7 05 1c b8 26 01 00 00 00 00
[  140.746095] RSP: 0018:ffffb13cc0a93d48 EFLAGS: 00010213
[  140.747072] RAX: ffffffff9b184000 RBX: 0000000000000000 RCX: 0000000000000000
[  140.748396] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  140.749690] RBP: ffffb13cc0a93d60 R08: 0000000000000000 R09: 000000000000002a
[  140.750984] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000000
[  140.752281] R13: 0000000000000000 R14: ffff958b2a6630c8 R15: fffffffffffffff2
[  140.753599] FS:  00007fd5f0163740(0000) GS:ffff958b3ba00000(0000) knlGS:0000000000000000
[  140.755104] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  140.756181] CR2: 00007fd5f0166000 CR3: 000000012d0f0000 CR4: 00000000000006f0
[  140.757503] Call Trace:
[  140.757977]  ftrace_bug_set+0x29/0x30 [ftrace_test]
[  140.759042]  param_attr_store+0x6e/0xd0
[  140.759762]  module_attr_store+0x20/0x30
[  140.760510]  sysfs_kf_write+0x3f/0x50
[  140.761191]  kernfs_fop_write+0x124/0x1b0
[  140.763118]  __vfs_write+0x3a/0x190
[  140.764352]  ? security_file_permission+0x3b/0xc0
[  140.765791]  ? _cond_resched+0x19/0x30
[  140.767088]  vfs_write+0xb2/0x1b0
[  140.768272]  ksys_write+0x55/0xc0
[  140.769432]  __x64_sys_write+0x1a/0x20
[  140.770672]  do_syscall_64+0x60/0x190
[  140.771914]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  140.773422] RIP: 0033:0x7fd5ef841fd0
[  140.774626] Code: 73 01 c3 48 8b 0d c0 6e 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d cd cf 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
[  140.779110] RSP: 002b:00007ffc6b03b9d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  140.781142] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd5ef841fd0
[  140.783049] RDX: 0000000000000002 RSI: 00007fd5f0166000 RDI: 0000000000000001
[  140.784883] RBP: 00007fd5f0166000 R08: 000000000000000a R09: 00007fd5f0163740
[  140.786754] R10: 00007fd5f0163740 R11: 0000000000000246 R12: 00007fd5efb1a400
[  140.789048] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
[  140.791302] ---[ end trace 6688d10c2ff4dd77 ]---
[  140.792761] ftrace faulted on unknown error 
[  140.792762] [<0000000000000000>]           (null)
[  141.806317] livepatch: enabling patch 'livepatch_sample'
[  141.809028] livepatch: failed to set ftrace filter for function 'cmdline_proc_show' (-19)
[  141.812158] livepatch: failed to patch object 'vmlinux'
[  141.813772] livepatch: failed to enable patch 'livepatch_sample'
[  141.815529] livepatch: 'livepatch_sample': unpatching complete

If I follow the conversation correctly, it was suggested to add better error-checking to ftrace, for example this patch that adds a check for !ftrace_enabled. This turns the second test case into:

TEST: ftrace_enabled=0...
insmod: ERROR: could not insert module livepatch-sample/livepatch-sample.ko: No such device
[   84.469847] livepatch: enabling patch 'livepatch_sample'
[   84.470970] livepatch: failed to register ftrace handler for function 'cmdline_proc_show' (-19)
[   84.472579] livepatch: failed to patch object 'vmlinux'
[   84.473576] livepatch: failed to enable patch 'livepatch_sample'
[   84.474734] livepatch: 'livepatch_sample': unpatching complete

That's not exactly consistent with the ftrace_disabled=1 case, which errors out on ftrace_set_filter_ip() instead of register_ftrace_function(), but perhaps Steven might have in mind a more inclusive patch that encompasses other ftrace entry points. (See 4eebcc81a33f ftrace: disable tracing on failure for example.)

My tests artificially set ftrace_enabled, ftrace_disabled, and called ftrace_bug(), which would be easy for exported ftrace entrypoints to verify, but there are additional cases like the call chain reported: ftrace_replace_code() may fail but that is not directly propagated back up the call chain. The error path does invoke ftrace_bug() -> FTRACE_WARN_ON -> ftrace_kill() -> ftrace_disabled = 1, ftrace_enabled = 0 so perhaps those variables could be sanity checked somewhere on the way out by its callers that are prepared to return errors.

@jpoimboe
Copy link
Member

@joe-lawrence Thanks for looking at this. It sounds like we still have a bug in ftrace. Do you want to engage Steven on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants