Feature request for static/dynamic GPGPU EUs slice power on/off to performance benefit encoding/transcoding workloads #152
Description
Please, provide an API to enable static and/or dynamic GPGPU slice shutdown (EU slices power on and off). Performance of a number of media workloads suffer from the costs to manage additional EU slices which are not really needed for these workloads (this is related to the length of the wave front). The lower resolution – the smaller wave front and, hence, smaller number of EUs is required. Something like the following is true:
- <=480p – 1 slice is enough
- <=720p – 2 slices is enough
- certain use cases (like 4K resolutions) may require 3 slices
Number of slices is one of the key factors why SKL performance is lower than BDW performance on the same workloads and on the aligned configurations in all other ways.
You can check influence of number of slices using the following drm-tip i915 KMD patches:
https://patchwork.kernel.org/patch/9670509/ [RFC,2/2] drm/i915/bdw: permit make_rpcs execution on BDW to enable slice shutdown
https://patchwork.kernel.org/patch/9670507/ [RFC,1/2] drm/i915/skl: add slice shutdown debugfs interface
These patches permit to power on/off slices with simple flashing “ echo 1 > /sys/kernel/debug/dri/0/i915_slice_enabled” where ‘1’ is number of slices to be used in the next created context.
For example, running avcenc to encode in parallel N streams:
#!/bin/bash
avcenc X Y stream_XxY.yuv /dev/null --qp=24 --mode=2 &
avcenc X Y stream_XxY.yuv /dev/null --qp=24 --mode=2 &
avcenc X Y stream_XxY.yuv /dev/null --qp=24 --mode=2 &
wait
The following can be observed:
- 45 sessions of 174x144p stream: x1.15 performance difference between SKL with 3 slices and 1 slice (1 slice wins)
- 15 sessions of 720x480p stream: x1.06 performance difference between SKL with 3 slices and 2 slices (2 slices wins)
These results may significantly vary depending on input stream (because of motion estimation complexity). I would estimate the range of performance difference between the default and optimal slice number as 15-20% for resolutions close to 176-144 and 5-10% for resolutions close to 720x480.