Skip to content

nighly ci build fails due bug in nvme-tcp #3113

@igaw

Description

@igaw

The nightly CI build fails due the long standing possible circular locking dependency detected. IIRC, there was not a lot of progress on this for a quite a while. Maybe we should have something which annotates the tests and doesn't fail it completely. Obviously, the best thing would be to fix it...

CC: @kawasaki

[  704.361362] run blktests nvme/005 at 2026-02-25 03:47:54
[  704.532546] loop0: detected capacity change from 0 to 2097152
[  704.595709] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  704.671982] nvmet_tcp: enabling port 0 (10.0.2.2:4420)
[  704.856746] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  704.872976] nvme nvme2: creating 4 I/O queues.
[  704.889641] nvme nvme2: mapped 4/0/0 default/read/poll queues.
[  704.902269] nvme nvme2: new ctrl: NQN "blktests-subsystem-1", addr 10.0.2.2:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349

[  705.442498] ======================================================
[  705.443855] WARNING: possible circular locking dependency detected
[  705.445329] 7.0.0-rc1 #1 Tainted: G            E      
[  705.446756] ------------------------------------------------------
[  705.448647] kworker/0:1/5932 is trying to acquire lock:
[  705.450324] ffffffff86625be0 (fs_reclaim){+.+.}-{0:0}, at: kmem_cache_alloc_node_noprof+0x54/0x6e0
[  705.451991] 
               but task is already holding lock:
[  705.454264] ffff888119164218 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: tcp_close+0x18/0xc0
[  705.456269] 
               which lock already depends on the new lock.

[  705.459875] 
               the existing dependency chain (in reverse order) is:
[  705.461860] 
               -> #4 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
[  705.463431]        __lock_acquire+0x55d/0xc00
[  705.464332]        lock_acquire.part.0+0xb7/0x220
[  705.465291]        lock_sock_nested+0x3b/0xe0
[  705.466166]        tcp_sendmsg+0x1d/0x40
[  705.466997]        sock_sendmsg+0x2fb/0x410
[  705.467853]        nvme_tcp_try_send_cmd_pdu+0x5a0/0xc10 [nvme_tcp]
[  705.468922]        nvme_tcp_try_send+0x1b6/0x9b0 [nvme_tcp]
[  705.469884]        nvme_tcp_queue_rq+0x449/0x6d0 [nvme_tcp]
[  705.470828]        blk_mq_dispatch_rq_list+0x38f/0x14a0
[  705.471768]        __blk_mq_sched_dispatch_requests+0x1a9/0x4b0
[  705.472740]        blk_mq_sched_dispatch_requests+0xac/0x150
[  705.473686]        blk_mq_run_hw_queue+0x1c7/0x500
[  705.474541]        blk_execute_rq+0x16a/0x310
[  705.475360]        __nvme_submit_sync_cmd+0x108/0x320 [nvme_core]
[  705.476328]        nvmf_connect_io_queue+0x1c7/0x2f0 [nvme_fabrics]
[  705.477287]        nvme_tcp_start_queue+0x4eb/0x6a0 [nvme_tcp]
[  705.478195]        nvme_tcp_configure_io_queues.cold+0x4b6/0x9ea [nvme_tcp]
[  705.479177]        nvme_tcp_setup_ctrl+0x281/0x5f0 [nvme_tcp]
[  705.480066]        nvme_tcp_create_ctrl+0xf6/0x120 [nvme_tcp]
[  705.480952]        nvmf_create_ctrl+0x34c/0x920 [nvme_fabrics]
[  705.481878]        nvmf_dev_write+0xd5/0x180 [nvme_fabrics]
[  705.482771]        vfs_write+0x1c7/0x1090
[  705.483534]        ksys_write+0xf9/0x1d0
[  705.484265]        do_syscall_64+0x133/0x15c0
[  705.485070]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  705.485953] 
               -> #3 (set->srcu){.+.+}-{0:0}:
[  705.487198]        __lock_acquire+0x55d/0xc00
[  705.487947]        lock_sync.part.0+0x41/0x80
[  705.488687]        __synchronize_srcu+0xa7/0x240
[  705.489462]        elevator_switch+0x2a8/0x660
[  705.490199]        elevator_change+0x2d5/0x4e0
[  705.490912]        elevator_set_none+0x8b/0xe0
[  705.491623]        blk_unregister_queue+0x144/0x2c0
[  705.492369]        __del_gendisk+0x265/0xa20
[  705.493069]        del_gendisk+0x105/0x190
[  705.493752]        nvme_ns_remove+0x32c/0x8f0 [nvme_core]
[  705.494542]        nvme_remove_namespaces+0x267/0x3b0 [nvme_core]
[  705.495384]        nvme_do_delete_ctrl+0xf5/0x160 [nvme_core]
[  705.496200]        nvme_delete_ctrl_sync.cold+0x8/0xd [nvme_core]
[  705.497032]        nvme_sysfs_delete+0x96/0xc0 [nvme_core]
[  705.497821]        kernfs_fop_write_iter+0x3df/0x5f0
[  705.498572]        vfs_write+0x518/0x1090
[  705.499219]        ksys_write+0xf9/0x1d0
[  705.499855]        do_syscall_64+0x133/0x15c0
[  705.500526]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  705.501301] 
               -> #2 (&q->elevator_lock){+.+.}-{4:4}:
[  705.502424]        __lock_acquire+0x55d/0xc00
[  705.503092]        lock_acquire.part.0+0xb7/0x220
[  705.503785]        __mutex_lock+0x1a6/0x23c0
[  705.504449]        elevator_change+0x189/0x4e0
[  705.505122]        elv_iosched_store+0x32e/0x3b0
[  705.505817]        queue_attr_store+0x232/0x2f0
[  705.506501]        kernfs_fop_write_iter+0x3df/0x5f0
[  705.507213]        vfs_write+0x518/0x1090
[  705.507855]        ksys_write+0xf9/0x1d0
[  705.508484]        do_syscall_64+0x133/0x15c0
[  705.509160]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  705.509918] 
               -> #1 (&q->q_usage_counter(io)#4){++++}-{0:0}:
[  705.511080]        __lock_acquire+0x55d/0xc00
[  705.511745]        lock_acquire.part.0+0xb7/0x220
[  705.512435]        blk_alloc_queue+0x5b7/0x730
[  705.513101]        blk_mq_alloc_queue+0x14b/0x230
[  705.513792]        scsi_alloc_sdev+0x851/0xcc0
[  705.514471]        scsi_probe_and_add_lun+0x71e/0xc30
[  705.515177]        __scsi_add_device+0x1c4/0x1f0
[  705.515852]        ata_scsi_scan_host+0x13d/0x390
[  705.516544]        async_run_entry_fn+0x9a/0x510
[  705.517233]        process_one_work+0x854/0x1540
[  705.517908]        worker_thread+0x5f3/0xfe0
[  705.518555]        kthread+0x35d/0x460
[  705.519159]        ret_from_fork+0x556/0x850
[  705.519818]        ret_from_fork_asm+0x1a/0x30
[  705.520492] 
               -> #0 (fs_reclaim){+.+.}-{0:0}:
[  705.521529]        check_prev_add+0xeb/0xc50
[  705.522172]        validate_chain+0x468/0x550
[  705.522824]        __lock_acquire+0x55d/0xc00
[  705.523474]        lock_acquire.part.0+0xb7/0x220
[  705.524152]        fs_reclaim_acquire+0xc9/0x110
[  705.524826]        kmem_cache_alloc_node_noprof+0x54/0x6e0
[  705.525572]        __alloc_skb+0xe3/0x5b0
[  705.526201]        tcp_send_active_reset+0x8a/0x750
[  705.526898]        tcp_disconnect+0x16c6/0x1f60
[  705.527565]        __tcp_close+0x7d7/0xee0
[  705.528195]        tcp_close+0x23/0xc0
[  705.528796]        inet_release+0x109/0x230
[  705.529447]        __sock_release+0xa6/0x250
[  705.530091]        sock_close+0x18/0x20
[  705.530700]        __fput+0x372/0xaa0
[  705.531292]        delayed_fput+0x6d/0x90
[  705.531908]        process_one_work+0x854/0x1540
[  705.532575]        worker_thread+0x5f3/0xfe0
[  705.533220]        kthread+0x35d/0x460
[  705.533817]        ret_from_fork+0x556/0x850
[  705.534454]        ret_from_fork_asm+0x1a/0x30
[  705.535101] 
               other info that might help us debug this:

[  705.536558] Chain exists of:
                 fs_reclaim --> set->srcu --> sk_lock-AF_INET-NVME

[  705.538229]  Possible unsafe locking scenario:

[  705.539273]        CPU0                    CPU1
[  705.539934]        ----                    ----
[  705.540591]   lock(sk_lock-AF_INET-NVME);
[  705.541217]                                lock(set->srcu);
[  705.541961]                                lock(sk_lock-AF_INET-NVME);
[  705.542777]   lock(fs_reclaim);
[  705.543328] 
                *** DEADLOCK ***

[  705.544617] 4 locks held by kworker/0:1/5932:
[  705.545271]  #0: ffff8881001dad48 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xfa9/0x1540
[  705.546337]  #1: ffff88816248fd10 ((delayed_fput_work).work){+.+.}-{0:0}, at: process_one_work+0x7dc/0x1540
[  705.547428]  #2: ffff88812aa28d88 (&sb->s_type->i_mutex_key#11){+.+.}-{4:4}, at: __sock_release+0x7b/0x250
[  705.548521]  #3: ffff888119164218 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: tcp_close+0x18/0xc0
[  705.549536] 
               stack backtrace:
[  705.550507] CPU: 0 UID: 0 PID: 5932 Comm: kworker/0:1 Tainted: G            E       7.0.0-rc1 #1 PREEMPT(lazy) 
[  705.550514] Tainted: [E]=UNSIGNED_MODULE
[  705.550516] Hardware name: KubeVirt None/RHEL, BIOS 1.16.3-2.el9 04/01/2014
[  705.550519] Workqueue: events delayed_fput
[  705.550530] Call Trace:
[  705.550543]  <TASK>
[  705.550547]  dump_stack_lvl+0x6e/0xa0
[  705.550554]  print_circular_bug.cold+0x38/0x45
[  705.550570]  check_noncircular+0x144/0x160
[  705.550582]  check_prev_add+0xeb/0xc50
[  705.550590]  validate_chain+0x468/0x550
[  705.550595]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  705.550602]  __lock_acquire+0x55d/0xc00
[  705.550611]  lock_acquire.part.0+0xb7/0x220
[  705.550618]  ? kmem_cache_alloc_node_noprof+0x54/0x6e0
[  705.550625]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550629]  ? lock_acquire+0x12c/0x150
[  705.550637]  fs_reclaim_acquire+0xc9/0x110
[  705.550640]  ? kmem_cache_alloc_node_noprof+0x54/0x6e0
[  705.550646]  kmem_cache_alloc_node_noprof+0x54/0x6e0
[  705.550650]  ? __alloc_skb+0xe3/0x5b0
[  705.550654]  ? lockdep_hardirqs_on_prepare.part.0+0x9b/0x140
[  705.550662]  __alloc_skb+0xe3/0x5b0
[  705.550666]  ? __alloc_skb+0x49e/0x5b0
[  705.550671]  ? __pfx___alloc_skb+0x10/0x10
[  705.550677]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550681]  ? tcp_set_state+0xf3/0x510
[  705.550688]  tcp_send_active_reset+0x8a/0x750
[  705.550696]  tcp_disconnect+0x16c6/0x1f60
[  705.550700]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550704]  ? do_raw_spin_lock+0x12d/0x270
[  705.550711]  __tcp_close+0x7d7/0xee0
[  705.550716]  ? __local_bh_enable_ip+0xac/0x120
[  705.550730]  ? trace_hardirqs_on+0x18/0x170
[  705.550743]  tcp_close+0x23/0xc0
[  705.550748]  inet_release+0x109/0x230
[  705.550754]  __sock_release+0xa6/0x250
[  705.550762]  sock_close+0x18/0x20
[  705.550766]  __fput+0x372/0xaa0
[  705.550771]  ? __lock_release.isra.0+0x5d/0x180
[  705.550776]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550782]  delayed_fput+0x6d/0x90
[  705.550787]  process_one_work+0x854/0x1540
[  705.550800]  ? __pfx_process_one_work+0x10/0x10
[  705.550804]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550811]  ? lock_is_held_type+0x9a/0x110
[  705.550817]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550824]  worker_thread+0x5f3/0xfe0
[  705.550832]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550835]  ? __kthread_parkme+0xb1/0x1f0
[  705.550843]  ? __pfx_worker_thread+0x10/0x10
[  705.550853]  kthread+0x35d/0x460
[  705.550857]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550860]  ? __pfx_kthread+0x10/0x10
[  705.550866]  ret_from_fork+0x556/0x850
[  705.550872]  ? __pfx_ret_from_fork+0x10/0x10
[  705.550879]  ? srso_alias_return_thunk+0x5/0xfbef5
[  705.550882]  ? __switch_to+0x435/0xd80
[  705.550894]  ? __pfx_kthread+0x10/0x10
[  705.550900]  ret_from_fork_asm+0x1a/0x30
[  705.550914]  </TASK>
[  705.595600] nvmet: Created nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  705.602818] nvme nvme2: creating 4 I/O queues.
[  705.628770] nvme nvme2: mapped 4/0/0 default/read/poll queues.
[  705.699976] nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions