forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 6
DEEPIN: scsi: Bypass certain SCSI commands on disks with "use_192_bytes_for_3f" attribute #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Avenger-285714
wants to merge
1
commit into
AOSC-Tracking:master
Choose a base branch
from
Avenger-285714:aosc-scsi
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…es_for_3f" attribute On some external USB hard drives, mounting can fail if "lshw" is executed during the process. This occurs because data sent to the device's output endpoint in certain abnormal scenarios does not receive a response, leading to a mount timeout. [ Description of "use_192_bytes_for_3f" in the kernel code: ] /* * Many disks only accept MODE SENSE transfer lengths of * 192 bytes (that's what Windows uses). */ sdev->use_192_bytes_for_3f = 1; The kernel's SCSI driver, when handling devices with this attribute, sends commands with a length of 192 bytes like this: if (sdp->use_192_bytes_for_3f) res = sd_do_mode_sense(sdp, 0, 0x3F, buffer, 192, &data, NULL); However, "lshw" disregards the "use_192_bytes_for_3f" attribute and transmits data with a length of 0xff bytes via ioctl, which can cause some hard drives to hang and become unusable. To resolve this issue, prevent commands with a length of 0xff bytes from being queued via ioctl when it detects the "use_192_bytes_for_3f" attribute on the device. The hard drive device identified with the issue is Lenovo USB 17ef:4531. Tested on HONOR NBLK-WAX9X (C234) Notebook with AMD Ryzen 7 3700U. [ Kernel logs: ] 2024-10-31 13:36:11 localhost kernel: [ 25.770091] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd 2024-10-31 13:36:11 localhost kernel: [ 25.798558] usb 2-2: New USB device found, idVendor=17ef, idProduct=4531, bcdDevice= 5.12 2024-10-31 13:36:11 localhost kernel: [ 25.798562] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 2024-10-31 13:36:11 localhost kernel: [ 25.798564] usb 2-2: Product: Lenovo Portable HDD 2024-10-31 13:36:11 localhost kernel: [ 25.798566] usb 2-2: Manufacturer: Lenovo 2024-10-31 13:36:11 localhost kernel: [ 25.798567] usb 2-2: SerialNumber: 000000001E4C 2024-10-31 13:36:11 localhost kernel: [ 25.820244] usb-storage 2-2:1.0: USB Mass Storage device detected 2024-10-31 13:36:11 localhost kernel: [ 25.820457] scsi host0: usb-storage 2-2:1.0 2024-10-31 13:36:11 localhost kernel: [ 25.820633] usbcore: registered new interface driver usb-storage 2024-10-31 13:36:11 localhost kernel: [ 25.825598] usbcore: registered new interface driver uas 2024-10-31 13:36:14 localhost kernel: [ 28.852179] scsi 0:0:0:0: Direct-Access Lenovo USB Hard Drive 0006 PQ: 0 ANSI: 2 2024-10-31 13:36:14 localhost kernel: [ 28.852961] sd 0:0:0:0: Attached scsi generic sg0 type 0 2024-10-31 13:36:14 localhost kernel: [ 28.891218] sd 0:0:0:0: [sda] 976773164 512-byte logical blocks: (500 GB/466 GiB) 2024-10-31 13:36:14 localhost kernel: [ 28.906892] sd 0:0:0:0: [sda] Write Protect is off 2024-10-31 13:36:14 localhost kernel: [ 28.906896] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 2024-10-31 13:36:14 localhost kernel: [ 28.922606] sd 0:0:0:0: [sda] No Caching mode page found 2024-10-31 13:36:14 localhost kernel: [ 28.922612] sd 0:0:0:0: [sda] Assuming drive cache: write through 2024-10-31 13:36:14 localhost kernel: [ 29.007816] sda: sda1 2024-10-31 13:36:15 localhost kernel: [ 30.180380] sd 0:0:0:0: [sda] Attached SCSI disk 2024-10-31 13:36:16 localhost kernel: [ 30.722863] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011 2024-10-31 13:36:16 localhost kernel: [ 30.734139] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011 2024-10-31 13:36:17 localhost kernel: [ 31.396011] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384) 2024-10-31 13:36:18 localhost kernel: [ 32.933537] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3 2024-10-31 13:36:18 localhost kernel: [ 32.933541] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2 2024-10-31 13:36:39 localhost kernel: [ 54.242220] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd 2024-10-31 13:36:50 localhost kernel: [ 64.408879] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384) 2024-10-31 13:37:11 localhost kernel: [ 85.466479] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd 2024-10-31 13:37:11 localhost kernel: [ 85.490248] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK 2024-10-31 13:37:11 localhost kernel: [ 85.490255] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00 2024-10-31 13:37:11 localhost kernel: [ 85.490258] print_req_error: I/O error, dev sda, sector 32 2024-10-31 13:37:33 localhost kernel: [ 107.432186] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384) 2024-10-31 13:37:41 localhost kernel: [ 116.194201] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd 2024-10-31 13:37:49 localhost kernel: [ 123.555484] dolphin[7271]: segfault at 10 ip 00007fcccc0d7f76 sp 00007ffe8004b860 error 4 in libKF5CoreAddons.so.5.102.0[7fcccc0a5000+83000] 2024-10-31 13:37:49 localhost kernel: [ 123.555502] Code: d6 90 66 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 e8 8e 94 01 00 ba 04 00 00 00 48 89 de 48 89 c7 e8 4e 8f 01 00 84 c0 75 2a <48> 8b 7d 10 48 85 ff 74 21 45 89 e1 48 89 da 48 89 ee 5b 41 b8 01 2024-10-31 13:38:11 localhost kernel: [ 146.229510] usb 2-2: USB disconnect, device number 2 2024-10-31 13:38:11 localhost kernel: [ 146.237993] scsi 0:0:0:0: rejecting I/O to dead device 2024-10-31 13:38:11 localhost kernel: [ 146.238003] print_req_error: I/O error, dev sda, sector 32 2024-10-31 13:38:11 localhost kernel: [ 146.238009] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238029] scsi 0:0:0:0: rejecting I/O to dead device 2024-10-31 13:38:11 localhost kernel: [ 146.238030] print_req_error: I/O error, dev sda, sector 36 2024-10-31 13:38:11 localhost kernel: [ 146.238032] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238045] scsi 0:0:0:0: rejecting I/O to dead device 2024-10-31 13:38:11 localhost kernel: [ 146.238047] print_req_error: I/O error, dev sda, sector 6291480 2024-10-31 13:38:11 localhost kernel: [ 146.238062] Buffer I/O error on dev sda1, logical block 786431, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238168] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238170] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238175] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238176] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238184] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238185] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238199] Buffer I/O error on dev sda, logical block 40, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238201] Buffer I/O error on dev sda, logical block 41, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238205] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238206] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238210] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238211] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238215] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238217] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238220] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238221] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238224] Buffer I/O error on dev sda, logical block 8, async page read 2024-10-31 13:38:11 localhost kernel: [ 146.238226] Buffer I/O error on dev sda, logical block 9, async page read 2024-10-31 13:38:12 localhost kernel: [ 146.482007] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011 2024-10-31 13:38:12 localhost kernel: [ 146.494064] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011 2024-10-31 13:38:15 localhost kernel: [ 150.065848] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3 2024-10-31 13:38:15 localhost kernel: [ 150.065852] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2 2024-10-31 13:38:26 localhost kernel: [ 160.433037] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384) 2024-10-31 13:39:29 localhost kernel: [ 223.444589] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384) Link: https://linux-hardware.org/?id=usb:17ef-4531 Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/4EB8ECD64F601331+e2f01a1f-8da5-4e7b-b909-d920a792756a@uniontech.com/ Reported-by: Xinwei Zhou <[email protected]> Co-developed-by: Xu Rao <[email protected]> Signed-off-by: Xu Rao <[email protected]> Tested-by: Yujing Ming <[email protected]> Signed-off-by: WangYuli <[email protected]>
xry111
pushed a commit
that referenced
this pull request
Jul 14, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>
KexyBiscuit
pushed a commit
that referenced
this pull request
Jul 20, 2025
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] torvalds#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] torvalds#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <[email protected]> Link: https://lore.kernel.org/netfilter-devel/[email protected]/ Link: https://lore.kernel.org/netfilter-devel/[email protected]/ Fixes: 1397af5 ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
MingcongBai
added a commit
that referenced
this pull request
Jul 21, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>
MingcongBai
added a commit
that referenced
this pull request
Jul 21, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>
KexyBiscuit
pushed a commit
that referenced
this pull request
Jul 22, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mingcong Bai <[email protected]>
KexyBiscuit
pushed a commit
that referenced
this pull request
Jul 22, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit
pushed a commit
that referenced
this pull request
Jul 23, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit
pushed a commit
that referenced
this pull request
Jul 23, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit
pushed a commit
that referenced
this pull request
Jul 23, 2025
Similar to the preceding patch for GuC (and with the same references), Intel GPUs expects command buffers to align to 4KiB boundaries. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]--- Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue. Cc: [email protected] Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On some external USB hard drives, mounting can fail if "lshw" is executed during the process.
This occurs because data sent to the device's output endpoint in certain abnormal scenarios does not receive a response, leading to a mount timeout.
[ Description of "use_192_bytes_for_3f" in the kernel code: ]
/*
The kernel's SCSI driver, when handling devices with this attribute, sends commands with a length of 192 bytes like this:
if (sdp->use_192_bytes_for_3f)
res = sd_do_mode_sense(sdp, 0, 0x3F, buffer, 192, &data, NULL);
However, "lshw" disregards the "use_192_bytes_for_3f" attribute and transmits data with a length of 0xff bytes via ioctl, which can cause some hard drives to hang and become unusable.
To resolve this issue, prevent commands with a length of 0xff bytes from being queued via ioctl when it detects the "use_192_bytes_for_3f" attribute on the device.
The hard drive device identified with the issue is Lenovo USB 17ef:4531. Tested on HONOR NBLK-WAX9X (C234) Notebook with AMD Ryzen 7 3700U.
[ Kernel logs: ]
2024-10-31 13:36:11 localhost kernel: [ 25.770091] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:36:11 localhost kernel: [ 25.798558] usb 2-2: New USB device found, idVendor=17ef, idProduct=4531, bcdDevice= 5.12
2024-10-31 13:36:11 localhost kernel: [ 25.798562] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
2024-10-31 13:36:11 localhost kernel: [ 25.798564] usb 2-2: Product: Lenovo Portable HDD
2024-10-31 13:36:11 localhost kernel: [ 25.798566] usb 2-2: Manufacturer: Lenovo
2024-10-31 13:36:11 localhost kernel: [ 25.798567] usb 2-2: SerialNumber: 000000001E4C
2024-10-31 13:36:11 localhost kernel: [ 25.820244] usb-storage 2-2:1.0: USB Mass Storage device detected
2024-10-31 13:36:11 localhost kernel: [ 25.820457] scsi host0: usb-storage 2-2:1.0
2024-10-31 13:36:11 localhost kernel: [ 25.820633] usbcore: registered new interface driver usb-storage
2024-10-31 13:36:11 localhost kernel: [ 25.825598] usbcore: registered new interface driver uas
2024-10-31 13:36:14 localhost kernel: [ 28.852179] scsi 0:0:0:0: Direct-Access Lenovo USB Hard Drive 0006 PQ: 0 ANSI: 2
2024-10-31 13:36:14 localhost kernel: [ 28.852961] sd 0:0:0:0: Attached scsi generic sg0 type 0
2024-10-31 13:36:14 localhost kernel: [ 28.891218] sd 0:0:0:0: [sda] 976773164 512-byte logical blocks: (500 GB/466 GiB)
2024-10-31 13:36:14 localhost kernel: [ 28.906892] sd 0:0:0:0: [sda] Write Protect is off
2024-10-31 13:36:14 localhost kernel: [ 28.906896] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
2024-10-31 13:36:14 localhost kernel: [ 28.922606] sd 0:0:0:0: [sda] No Caching mode page found
2024-10-31 13:36:14 localhost kernel: [ 28.922612] sd 0:0:0:0: [sda] Assuming drive cache: write through
2024-10-31 13:36:14 localhost kernel: [ 29.007816] sda: sda1
2024-10-31 13:36:15 localhost kernel: [ 30.180380] sd 0:0:0:0: [sda] Attached SCSI disk
2024-10-31 13:36:16 localhost kernel: [ 30.722863] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011
2024-10-31 13:36:16 localhost kernel: [ 30.734139] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011
2024-10-31 13:36:17 localhost kernel: [ 31.396011] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:36:18 localhost kernel: [ 32.933537] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3
2024-10-31 13:36:18 localhost kernel: [ 32.933541] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2
2024-10-31 13:36:39 localhost kernel: [ 54.242220] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:36:50 localhost kernel: [ 64.408879] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:37:11 localhost kernel: [ 85.466479] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:37:11 localhost kernel: [ 85.490248] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
2024-10-31 13:37:11 localhost kernel: [ 85.490255] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00
2024-10-31 13:37:11 localhost kernel: [ 85.490258] print_req_error: I/O error, dev sda, sector 32
2024-10-31 13:37:33 localhost kernel: [ 107.432186] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:37:41 localhost kernel: [ 116.194201] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:37:49 localhost kernel: [ 123.555484] dolphin[7271]: segfault at 10 ip 00007fcccc0d7f76 sp 00007ffe8004b860 error 4 in libKF5CoreAddons.so.5.102.0[7fcccc0a5000+83000]
2024-10-31 13:37:49 localhost kernel: [ 123.555502] Code: d6 90 66 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 e8 8e 94 01 00 ba 04 00 00 00 48 89 de 48 89 c7 e8 4e 8f 01 00 84 c0 75 2a <48> 8b 7d 10 48 85 ff 74 21 45 89 e1 48 89 da 48 89 ee 5b 41 b8 01
2024-10-31 13:38:11 localhost kernel: [ 146.229510] usb 2-2: USB disconnect, device number 2
2024-10-31 13:38:11 localhost kernel: [ 146.237993] scsi 0:0:0:0: rejecting I/O to dead device
2024-10-31 13:38:11 localhost kernel: [ 146.238003] print_req_error: I/O error, dev sda, sector 32
2024-10-31 13:38:11 localhost kernel: [ 146.238009] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238029] scsi 0:0:0:0: rejecting I/O to dead device
2024-10-31 13:38:11 localhost kernel: [ 146.238030] print_req_error: I/O error, dev sda, sector 36
2024-10-31 13:38:11 localhost kernel: [ 146.238032] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238045] scsi 0:0:0:0: rejecting I/O to dead device
2024-10-31 13:38:11 localhost kernel: [ 146.238047] print_req_error: I/O error, dev sda, sector 6291480
2024-10-31 13:38:11 localhost kernel: [ 146.238062] Buffer I/O error on dev sda1, logical block 786431, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238168] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238170] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238175] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238176] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238184] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238185] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238199] Buffer I/O error on dev sda, logical block 40, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238201] Buffer I/O error on dev sda, logical block 41, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238205] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238206] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238210] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238211] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238215] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238217] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238220] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238221] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238224] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238226] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:12 localhost kernel: [ 146.482007] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011
2024-10-31 13:38:12 localhost kernel: [ 146.494064] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011
2024-10-31 13:38:15 localhost kernel: [ 150.065848] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3
2024-10-31 13:38:15 localhost kernel: [ 150.065852] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2
2024-10-31 13:38:26 localhost kernel: [ 160.433037] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:39:29 localhost kernel: [ 223.444589] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
Link: https://linux-hardware.org/?id=usb:17ef-4531
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/4EB8ECD64F601331+e2f01a1f-8da5-4e7b-b909-d920a792756a@uniontech.com/
Reported-by: Xinwei Zhou [email protected]
Co-developed-by: Xu Rao [email protected]
Signed-off-by: Xu Rao [email protected]
Tested-by: Yujing Ming [email protected]
Signed-off-by: WangYuli [email protected]