Skip to content

Commit 5ad7bbf

Browse files
guilhermepiccolialexdeucher
authored andcommitted
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli lkl#338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: <TASK> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1]. [0] https://lore.kernel.org/amd-gfx/[email protected]/ [1] https://lore.kernel.org/amd-gfx/[email protected]/ Fixes: 067f44c ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König <[email protected]> Cc: Guchun Chen <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Mario Limonciello <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
1 parent e53448e commit 5ad7bbf

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
618618
if (!ring || !ring->fence_drv.initialized)
619619
continue;
620620

621-
if (!ring->no_scheduler)
621+
/*
622+
* Notice we check for sched.ops since there's some
623+
* override on the meaning of sched.ready by amdgpu.
624+
* The natural check would be sched.ready, which is
625+
* set as drm_sched_init() finishes...
626+
*/
627+
if (ring->sched.ops)
622628
drm_sched_fini(&ring->sched);
623629

624630
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)

0 commit comments

Comments
 (0)