Enabling mb_rate_control kills whole machine (Skylake GT2) #172
Description
Build ffmpeg git master with @mypopydev's patch to add the mb_rate_control option: https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-May/211334.html.
Input file doesn't seem to matter much. To be consistent I am using the Big Buck Bunny 1080p file here.
Take steps to avoid data loss (remount all data mounts readonly, sync).
Run:
./ffmpeg_g -y -threads 1 -hwaccel vaapi -hwaccel_output_format vaapi -i bbb_1080_264.mp4 -an -c:v h264_vaapi -b 1M -mb_rate_control 1 /tmp/out.h264
After some frames (not repeatable between runs, but at most a few hundred) the machine becomes completely unresponsive.
On some runs I get a GPU hang log on the console (transcribed) before it locks up, but not consistently:
[drm] GPU HANG ecode 9:0:0x8fd0ffff, in ffmpeg_g [2669], reason: Hang on render ring, action: reset
[drm] {the usual GPU hang bug warning}
[drm] drm/i915: Resetting chip after gpu hang
[drm:i915_reset [i915]] *ERROR* Failed to reset chip: -110
Power-cycle to recover the machine.
Setup:
- Skylake 6300 (GT2, 23 EUs)
- Debian stock kernel ("4.9.0-3-amd64 github: Corrections to project names #1 SMP Debian 4.9.25-1 (2017-05-02)")
- i965 driver from git (1b0c312)
There are probably at least two issues here: in the VAAPI driver (because enabling mb_rate_control has broken the GPU) and in the kernel (because it didn't recover). I've only sent this here because the reproducer is here, but please do forward this if appropriate.
Possibly relevant: The same ffmpeg command with the mb_rate_control option works fine on a Skylake 6260U (GT3, 48 EUs). Could there be something about the proprietary shader binaries which only works on the larger GPU and breaks horribly on the smaller one?