Skip to content

DEEPIN: scsi: Bypass certain SCSI commands on disks with "use_192_bytes_for_3f" attribute #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Avenger-285714
Copy link

On some external USB hard drives, mounting can fail if "lshw" is executed during the process.

This occurs because data sent to the device's output endpoint in certain abnormal scenarios does not receive a response, leading to a mount timeout.

[ Description of "use_192_bytes_for_3f" in the kernel code: ]
/*

  • Many disks only accept MODE SENSE transfer lengths of
  • 192 bytes (that's what Windows uses). */ sdev->use_192_bytes_for_3f = 1;

The kernel's SCSI driver, when handling devices with this attribute, sends commands with a length of 192 bytes like this:
if (sdp->use_192_bytes_for_3f)
res = sd_do_mode_sense(sdp, 0, 0x3F, buffer, 192, &data, NULL);

However, "lshw" disregards the "use_192_bytes_for_3f" attribute and transmits data with a length of 0xff bytes via ioctl, which can cause some hard drives to hang and become unusable.

To resolve this issue, prevent commands with a length of 0xff bytes from being queued via ioctl when it detects the "use_192_bytes_for_3f" attribute on the device.

The hard drive device identified with the issue is Lenovo USB 17ef:4531. Tested on HONOR NBLK-WAX9X (C234) Notebook with AMD Ryzen 7 3700U.

[ Kernel logs: ]
2024-10-31 13:36:11 localhost kernel: [ 25.770091] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:36:11 localhost kernel: [ 25.798558] usb 2-2: New USB device found, idVendor=17ef, idProduct=4531, bcdDevice= 5.12
2024-10-31 13:36:11 localhost kernel: [ 25.798562] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
2024-10-31 13:36:11 localhost kernel: [ 25.798564] usb 2-2: Product: Lenovo Portable HDD
2024-10-31 13:36:11 localhost kernel: [ 25.798566] usb 2-2: Manufacturer: Lenovo
2024-10-31 13:36:11 localhost kernel: [ 25.798567] usb 2-2: SerialNumber: 000000001E4C
2024-10-31 13:36:11 localhost kernel: [ 25.820244] usb-storage 2-2:1.0: USB Mass Storage device detected
2024-10-31 13:36:11 localhost kernel: [ 25.820457] scsi host0: usb-storage 2-2:1.0
2024-10-31 13:36:11 localhost kernel: [ 25.820633] usbcore: registered new interface driver usb-storage
2024-10-31 13:36:11 localhost kernel: [ 25.825598] usbcore: registered new interface driver uas
2024-10-31 13:36:14 localhost kernel: [ 28.852179] scsi 0:0:0:0: Direct-Access Lenovo USB Hard Drive 0006 PQ: 0 ANSI: 2
2024-10-31 13:36:14 localhost kernel: [ 28.852961] sd 0:0:0:0: Attached scsi generic sg0 type 0
2024-10-31 13:36:14 localhost kernel: [ 28.891218] sd 0:0:0:0: [sda] 976773164 512-byte logical blocks: (500 GB/466 GiB)
2024-10-31 13:36:14 localhost kernel: [ 28.906892] sd 0:0:0:0: [sda] Write Protect is off
2024-10-31 13:36:14 localhost kernel: [ 28.906896] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
2024-10-31 13:36:14 localhost kernel: [ 28.922606] sd 0:0:0:0: [sda] No Caching mode page found
2024-10-31 13:36:14 localhost kernel: [ 28.922612] sd 0:0:0:0: [sda] Assuming drive cache: write through
2024-10-31 13:36:14 localhost kernel: [ 29.007816] sda: sda1
2024-10-31 13:36:15 localhost kernel: [ 30.180380] sd 0:0:0:0: [sda] Attached SCSI disk
2024-10-31 13:36:16 localhost kernel: [ 30.722863] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011
2024-10-31 13:36:16 localhost kernel: [ 30.734139] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011
2024-10-31 13:36:17 localhost kernel: [ 31.396011] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:36:18 localhost kernel: [ 32.933537] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3
2024-10-31 13:36:18 localhost kernel: [ 32.933541] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2
2024-10-31 13:36:39 localhost kernel: [ 54.242220] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:36:50 localhost kernel: [ 64.408879] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:37:11 localhost kernel: [ 85.466479] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:37:11 localhost kernel: [ 85.490248] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
2024-10-31 13:37:11 localhost kernel: [ 85.490255] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00
2024-10-31 13:37:11 localhost kernel: [ 85.490258] print_req_error: I/O error, dev sda, sector 32
2024-10-31 13:37:33 localhost kernel: [ 107.432186] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:37:41 localhost kernel: [ 116.194201] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
2024-10-31 13:37:49 localhost kernel: [ 123.555484] dolphin[7271]: segfault at 10 ip 00007fcccc0d7f76 sp 00007ffe8004b860 error 4 in libKF5CoreAddons.so.5.102.0[7fcccc0a5000+83000]
2024-10-31 13:37:49 localhost kernel: [ 123.555502] Code: d6 90 66 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 e8 8e 94 01 00 ba 04 00 00 00 48 89 de 48 89 c7 e8 4e 8f 01 00 84 c0 75 2a <48> 8b 7d 10 48 85 ff 74 21 45 89 e1 48 89 da 48 89 ee 5b 41 b8 01
2024-10-31 13:38:11 localhost kernel: [ 146.229510] usb 2-2: USB disconnect, device number 2
2024-10-31 13:38:11 localhost kernel: [ 146.237993] scsi 0:0:0:0: rejecting I/O to dead device
2024-10-31 13:38:11 localhost kernel: [ 146.238003] print_req_error: I/O error, dev sda, sector 32
2024-10-31 13:38:11 localhost kernel: [ 146.238009] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238029] scsi 0:0:0:0: rejecting I/O to dead device
2024-10-31 13:38:11 localhost kernel: [ 146.238030] print_req_error: I/O error, dev sda, sector 36
2024-10-31 13:38:11 localhost kernel: [ 146.238032] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238045] scsi 0:0:0:0: rejecting I/O to dead device
2024-10-31 13:38:11 localhost kernel: [ 146.238047] print_req_error: I/O error, dev sda, sector 6291480
2024-10-31 13:38:11 localhost kernel: [ 146.238062] Buffer I/O error on dev sda1, logical block 786431, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238168] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238170] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238175] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238176] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238184] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238185] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238199] Buffer I/O error on dev sda, logical block 40, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238201] Buffer I/O error on dev sda, logical block 41, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238205] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238206] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238210] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238211] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238215] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238217] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238220] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238221] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238224] Buffer I/O error on dev sda, logical block 8, async page read
2024-10-31 13:38:11 localhost kernel: [ 146.238226] Buffer I/O error on dev sda, logical block 9, async page read
2024-10-31 13:38:12 localhost kernel: [ 146.482007] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011
2024-10-31 13:38:12 localhost kernel: [ 146.494064] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011
2024-10-31 13:38:15 localhost kernel: [ 150.065848] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3
2024-10-31 13:38:15 localhost kernel: [ 150.065852] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2
2024-10-31 13:38:26 localhost kernel: [ 160.433037] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
2024-10-31 13:39:29 localhost kernel: [ 223.444589] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)

Link: https://linux-hardware.org/?id=usb:17ef-4531
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/4EB8ECD64F601331+e2f01a1f-8da5-4e7b-b909-d920a792756a@uniontech.com/
Reported-by: Xinwei Zhou [email protected]
Co-developed-by: Xu Rao [email protected]
Signed-off-by: Xu Rao [email protected]
Tested-by: Yujing Ming [email protected]
Signed-off-by: WangYuli [email protected]

…es_for_3f" attribute

On some external USB hard drives, mounting can fail if "lshw" is
executed during the process.

This occurs because data sent to the device's output endpoint in
certain abnormal scenarios does not receive a response, leading to
a mount timeout.

[ Description of "use_192_bytes_for_3f" in the kernel code: ]
  /*
   * Many disks only accept MODE SENSE transfer lengths of
   * 192 bytes (that's what Windows uses).
   */
   sdev->use_192_bytes_for_3f = 1;

The kernel's SCSI driver, when handling devices with this attribute,
sends commands with a length of 192 bytes like this:
  if (sdp->use_192_bytes_for_3f)
  	res = sd_do_mode_sense(sdp, 0, 0x3F, buffer, 192, &data, NULL);

However, "lshw" disregards the "use_192_bytes_for_3f" attribute and
transmits data with a length of 0xff bytes via ioctl, which can cause
some hard drives to hang and become unusable.

To resolve this issue, prevent commands with a length of 0xff bytes
from being queued via ioctl when it detects the "use_192_bytes_for_3f"
attribute on the device.

The hard drive device identified with the issue is Lenovo USB 17ef:4531.
Tested on HONOR NBLK-WAX9X (C234) Notebook with AMD Ryzen 7 3700U.

[ Kernel logs: ]
  2024-10-31 13:36:11 localhost kernel: [   25.770091] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
  2024-10-31 13:36:11 localhost kernel: [   25.798558] usb 2-2: New USB device found, idVendor=17ef, idProduct=4531, bcdDevice= 5.12
  2024-10-31 13:36:11 localhost kernel: [   25.798562] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
  2024-10-31 13:36:11 localhost kernel: [   25.798564] usb 2-2: Product: Lenovo Portable HDD
  2024-10-31 13:36:11 localhost kernel: [   25.798566] usb 2-2: Manufacturer: Lenovo
  2024-10-31 13:36:11 localhost kernel: [   25.798567] usb 2-2: SerialNumber: 000000001E4C
  2024-10-31 13:36:11 localhost kernel: [   25.820244] usb-storage 2-2:1.0: USB Mass Storage device detected
  2024-10-31 13:36:11 localhost kernel: [   25.820457] scsi host0: usb-storage 2-2:1.0
  2024-10-31 13:36:11 localhost kernel: [   25.820633] usbcore: registered new interface driver usb-storage
  2024-10-31 13:36:11 localhost kernel: [   25.825598] usbcore: registered new interface driver uas
  2024-10-31 13:36:14 localhost kernel: [   28.852179] scsi 0:0:0:0: Direct-Access     Lenovo   USB Hard Drive   0006 PQ: 0 ANSI: 2
  2024-10-31 13:36:14 localhost kernel: [   28.852961] sd 0:0:0:0: Attached scsi generic sg0 type 0
  2024-10-31 13:36:14 localhost kernel: [   28.891218] sd 0:0:0:0: [sda] 976773164 512-byte logical blocks: (500 GB/466 GiB)
  2024-10-31 13:36:14 localhost kernel: [   28.906892] sd 0:0:0:0: [sda] Write Protect is off
  2024-10-31 13:36:14 localhost kernel: [   28.906896] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
  2024-10-31 13:36:14 localhost kernel: [   28.922606] sd 0:0:0:0: [sda] No Caching mode page found
  2024-10-31 13:36:14 localhost kernel: [   28.922612] sd 0:0:0:0: [sda] Assuming drive cache: write through
  2024-10-31 13:36:14 localhost kernel: [   29.007816]  sda: sda1
  2024-10-31 13:36:15 localhost kernel: [   30.180380] sd 0:0:0:0: [sda] Attached SCSI disk
  2024-10-31 13:36:16 localhost kernel: [   30.722863] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011
  2024-10-31 13:36:16 localhost kernel: [   30.734139] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011
  2024-10-31 13:36:17 localhost kernel: [   31.396011] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
  2024-10-31 13:36:18 localhost kernel: [   32.933537] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3
  2024-10-31 13:36:18 localhost kernel: [   32.933541] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2
  2024-10-31 13:36:39 localhost kernel: [   54.242220] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
  2024-10-31 13:36:50 localhost kernel: [   64.408879] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
  2024-10-31 13:37:11 localhost kernel: [   85.466479] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
  2024-10-31 13:37:11 localhost kernel: [   85.490248] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
  2024-10-31 13:37:11 localhost kernel: [   85.490255] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00
  2024-10-31 13:37:11 localhost kernel: [   85.490258] print_req_error: I/O error, dev sda, sector 32
  2024-10-31 13:37:33 localhost kernel: [  107.432186] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
  2024-10-31 13:37:41 localhost kernel: [  116.194201] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
  2024-10-31 13:37:49 localhost kernel: [  123.555484] dolphin[7271]: segfault at 10 ip 00007fcccc0d7f76 sp 00007ffe8004b860 error 4 in libKF5CoreAddons.so.5.102.0[7fcccc0a5000+83000]
  2024-10-31 13:37:49 localhost kernel: [  123.555502] Code: d6 90 66 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 e8 8e 94 01 00 ba 04 00 00 00 48 89 de 48 89 c7 e8 4e 8f 01 00 84 c0 75 2a <48> 8b 7d 10 48 85 ff 74 21 45 89 e1 48 89 da 48 89 ee 5b 41 b8 01
  2024-10-31 13:38:11 localhost kernel: [  146.229510] usb 2-2: USB disconnect, device number 2
  2024-10-31 13:38:11 localhost kernel: [  146.237993] scsi 0:0:0:0: rejecting I/O to dead device
  2024-10-31 13:38:11 localhost kernel: [  146.238003] print_req_error: I/O error, dev sda, sector 32
  2024-10-31 13:38:11 localhost kernel: [  146.238009] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238029] scsi 0:0:0:0: rejecting I/O to dead device
  2024-10-31 13:38:11 localhost kernel: [  146.238030] print_req_error: I/O error, dev sda, sector 36
  2024-10-31 13:38:11 localhost kernel: [  146.238032] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238045] scsi 0:0:0:0: rejecting I/O to dead device
  2024-10-31 13:38:11 localhost kernel: [  146.238047] print_req_error: I/O error, dev sda, sector 6291480
  2024-10-31 13:38:11 localhost kernel: [  146.238062] Buffer I/O error on dev sda1, logical block 786431, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238168] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238170] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238175] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238176] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238184] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238185] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238199] Buffer I/O error on dev sda, logical block 40, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238201] Buffer I/O error on dev sda, logical block 41, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238205] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238206] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238210] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238211] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238215] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238217] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238220] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238221] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238224] Buffer I/O error on dev sda, logical block 8, async page read
  2024-10-31 13:38:11 localhost kernel: [  146.238226] Buffer I/O error on dev sda, logical block 9, async page read
  2024-10-31 13:38:12 localhost kernel: [  146.482007] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x3, stream=0x5, channel=0, format=0x4011
  2024-10-31 13:38:12 localhost kernel: [  146.494064] snd_hda_codec_realtek hdaudioC1D0: hda_codec_setup_stream: NID=0x2, stream=0x5, channel=0, format=0x4011
  2024-10-31 13:38:15 localhost kernel: [  150.065848] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x3
  2024-10-31 13:38:15 localhost kernel: [  150.065852] snd_hda_codec_realtek hdaudioC1D0: hda_codec_cleanup_stream: NID=0x2
  2024-10-31 13:38:26 localhost kernel: [  160.433037] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)
  2024-10-31 13:39:29 localhost kernel: [  223.444589] start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)

Link: https://linux-hardware.org/?id=usb:17ef-4531
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/4EB8ECD64F601331+e2f01a1f-8da5-4e7b-b909-d920a792756a@uniontech.com/
Reported-by: Xinwei Zhou <[email protected]>
Co-developed-by: Xu Rao <[email protected]>
Signed-off-by: Xu Rao <[email protected]>
Tested-by: Yujing Ming <[email protected]>
Signed-off-by: WangYuli <[email protected]>
xry111 pushed a commit that referenced this pull request Jul 14, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d
 rm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Mingcong Bai <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jul 20, 2025
A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list:
    [exception RIP: __nf_ct_delete_from_lists+172]
    [..]
 #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]
 torvalds#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]
 torvalds#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]
    [..]

The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state:

 ct hlist pointer is garbage; looks like the ct hash value
 (hence crash).
 ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected
 ct->timeout is 30000 (=30s), which is unexpected.

Everything else looks like normal udp conntrack entry.  If we ignore
ct->status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash:
  - ct hlist pointers are overloaded and store/cache the raw tuple hash
  - ct->timeout matches the relative time expected for a new udp flow
    rather than the absolute 'jiffies' value.

If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry.

Theory is that we did hit following race:

cpu x 			cpu y			cpu z
 found entry E		found entry E
 E is expired		<preemption>
 nf_ct_delete()
 return E to rcu slab
					init_conntrack
					E is re-inited,
					ct->status set to 0
					reply tuplehash hnnode.pprev
					stores hash value.

cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z.  cpu y was preempted before
checking for expiry and/or confirm bit.

					->refcnt set to 1
					E now owned by skb
					->timeout set to 30000

If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit.

					nf_conntrack_confirm gets called
					sets: ct->status |= CONFIRMED
					This is wrong: E is not yet added
					to hashtable.

cpu y resumes, it observes E as expired but CONFIRMED:
			<resumes>
			nf_ct_expired()
			 -> yes (ct->timeout is 30s)
			confirmed bit set.

cpu y will try to delete E from the hashtable:
			nf_ct_delete() -> set DYING bit
			__nf_ct_delete_from_lists

Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks:

			wait for spinlock held by z

					CONFIRMED is set but there is no
					guarantee ct will be added to hash:
					"chaintoolong" or "clash resolution"
					logic both skip the insert step.
					reply hnnode.pprev still stores the
					hash value.

					unlocks spinlock
					return NF_DROP
			<unblocks, then
			 crashes on hlist_nulls_del_rcu pprev>

In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs.

Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy.

To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock.

Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this.

It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set:

Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped.

Also change nf_ct_should_gc() to first check the confirmed bit.

The gc sequence is:
 1. Check if entry has expired, if not skip to next entry
 2. Obtain a reference to the expired entry.
 3. Call nf_ct_should_gc() to double-check step 1.

nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check passes
ct->timeout has been altered to reflect the absolute 'best before' date
instead of a relative time.  Step 3 will therefore not remove the entry.

Without this change to nf_ct_should_gc() we could still get this sequence:

 1. Check if entry has expired.
 2. Obtain a reference.
 3. Call nf_ct_should_gc() to double-check step 1:
    4 - entry is still observed as expired
    5 - meanwhile, ct->timeout is corrected to absolute value on other CPU
      and confirm bit gets set
    6 - confirm bit is seen
    7 - valid entry is removed again

First do check 6), then 4) so the gc expiry check always picks up either
confirmed bit unset (entry gets skipped) or expiry re-check failure for
re-inited conntrack objects.

This change cannot be backported to releases before 5.19. Without
commit 8a75a2c ("netfilter: conntrack: remove unconfirmed list")
|= IPS_CONFIRMED line cannot be moved without further changes.

Cc: Razvan Cojocaru <[email protected]>
Link: https://lore.kernel.org/netfilter-devel/[email protected]/
Link: https://lore.kernel.org/netfilter-devel/[email protected]/
Fixes: 1397af5 ("netfilter: conntrack: remove the percpu dying list")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
MingcongBai added a commit that referenced this pull request Jul 21, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d
 rm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Mingcong Bai <[email protected]>
MingcongBai added a commit that referenced this pull request Jul 21, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d
 rm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Mingcong Bai <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jul 22, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d
 rm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Mingcong Bai <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jul 22, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jul 23, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jul 23, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
KexyBiscuit pushed a commit that referenced this pull request Jul 23, 2025
Similar to the preceding patch for GuC (and with the same references),
Intel GPUs expects command buffers to align to 4KiB boundaries.

Current code uses `PAGE_SIZE' as an assumed alignment reference but 4KiB
kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this
causes driver failures during boot up:

[   14.018975] ------------[ cut here ]------------
[   14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out
[   14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E)
[   14.041369]  drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E)
[   14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.165325] Tainted: [E]=UNSIGNED_MODULE
[   14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0
[   14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000
[   14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000
[   14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000
[   14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000
[   14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640
[   14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028
[   14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8
[   14.255975]    ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.263379]   ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe]
[   14.270777]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[   14.276927]  PRMD: 00000004 (PPLV0 +PIE -PWE)
[   14.281258]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[   14.286024]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
[   14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
[   14.296329]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
[   14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G            E      6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7
[   14.302302] Tainted: [E]=UNSIGNED_MODULE
[   14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab
[   14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000
[   14.302310]         900000012f153900 0000000000000000 900000012f153908 9000000001c31c70
[   14.302313]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302315]         0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000
[   14.302318]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   14.302320]         0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48
[   14.302323]         9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004
[   14.302325]         0000000000000004 0000000000000000 000000000000137e 0000000000000001
[   14.302328]         900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98
[   14.302331]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
[   14.302333]         ...
[   14.302335] Call Trace:
[   14.302336] [<9000000000244174>] show_stack+0x3c/0x16c
[   14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0
[   14.302346] [<9000000000288208>] __warn+0x8c/0x174
[   14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c
[   14.302354] [<90000000017f66e8>] do_bp+0x280/0x344
[   14.302359]
[   14.302360] ---[ end trace 0000000000000000 ]---

Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the
aforementioned issue.

Cc: [email protected]
Fixes: b79e8fd ("drm/xe: Remove dependency on intel_engine_regs.h")
Tested-by: Mingcong Bai <[email protected]>
Tested-by: Wenbin Fang <[email protected]>
Tested-by: Haien Liang <[email protected]>
Tested-by: Jianfeng Liu <[email protected]>
Tested-by: Shirong Liu <[email protected]>
Tested-by: Haofeng Wu <[email protected]>
Link: FanFansfan@22c55ab
Link: https://t.me/c/1109254909/768552
Co-developed-by: Shang Yatsen <[email protected]>
Signed-off-by: Shang Yatsen <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kexy Biscuit <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant