Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia_p2p_get_pages(): Fix double-free in register-callback error path #557

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Sep 11, 2023

  1. nvidia_p2p_get_pages(): Fix double-free in register-callback error path

    Double-free in rm_p2p_register_callback() error-path in
    nv_p2p_get_pages() causes memory corruption that leads to a kernel
    panic.
    
    Fix this by adding a separate goto for this error path that skips
    freeing the already-freed memory.
    
    Double-free can be produced by calling nvidia_p2p_get_pages() on one CPU
    while simultaneously freeing the GPU virtual address range passed into
    nvidia_p2p_get_pages() on another CPU. Producing the double-free is
    timing dependent and may require multiple tries.
    
    'slub_debug=FZ' kernel boot parameter shows the double-free:
    
      [  239.115091] =============================================================================
      [  239.124659] BUG kmalloc-16 (Tainted: G           OE     ): Object already free
      [  239.133011] -----------------------------------------------------------------------------
    
      [  239.144491] Slab 0xfffffa8bc4434140 objects=85 used=82 fp=0xffff9a3dd0d05910 flags=0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
      [  239.158997] Object 0xffff9a3dd0d05670 @offset=1648 fp=0x0000000000000000
    
      [  239.168766] Redzone  ffff9a3dd0d05660: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [  239.179633] Object   ffff9a3dd0d05670: 10 00 00 00 00 00 00 00 e5 04 3f 13 96 18 8e 47  ..........?....G
      [  239.190641] Redzone  ffff9a3dd0d05680: bb bb bb bb bb bb bb bb                          ........
      [  239.200739] Padding  ffff9a3dd0d05688: 84 80 0e 00 00 00 00 00                          ........
      [  239.210938] CPU: 0 PID: 3150 Comm: hfi-sdma-test Kdump: loaded Tainted: G           OE      6.5.0-rc1+ NVIDIA#1
      [  239.221911] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.1029.090220201031 09/02/2020
      [  239.233948] Call Trace:
      [  239.236992]  <TASK>
      [  239.239608]  dump_stack_lvl+0x33/0x50
      [  239.244010]  object_err+0x3a/0x80
      [  239.248014]  free_debug_processing+0x265/0x360
      [  239.253392]  ? nv_p2p_get_pages+0x163/0x590 [nvidia]
      [  239.259399]  free_to_partial_list+0x80/0x280
      [  239.264478]  ? nv_p2p_get_pages+0x163/0x590 [nvidia]
      [  239.270426]  nv_p2p_get_pages+0x163/0x590 [nvidia]
      [  239.276303]  ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
      [  239.282692]  nvidia_p2p_get_pages+0x25/0x40 [nvidia]
      [  239.288601]  ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
      ...
      [  239.498990]  </TASK>
      [  239.501662] Disabling lock debugging due to kernel taint
      [  239.507828] FIX kmalloc-16: Object at 0xffff9a3dd0d05670 not freed
    
    Signed-off-by: Brendan Cunningham <[email protected]>
    BrendanCunningham committed Sep 11, 2023
    Configuration menu
    Copy the full SHA
    3a9b874 View commit details
    Browse the repository at this point in the history