-
Notifications
You must be signed in to change notification settings - Fork 0
Runtime Power Management
The dGPU power is controlled by the kernel. To achieve the best results for your desired use-case, you may however need to configure a couple of things. This page describes how to configure basic runtime power-management for the dGPU.
For use with X11, the nvidia-settings application provides some options.
For some of the configuration below, you will need to access the dGPU in sysfs.
To find the correct sysfs path, you will need to know your bus, device, and function numbers.
You can get those via lspci -PPD | grep -i nvidia
.
This should yield something like
00:1d.4/02:00.0 3D controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q] (rev a1)
or, more generic
<domain>:<bridge>/<bus>:<device>.<function> <description>
where the bus, device, and function numbers are the 02:00.0
.
For this example, this yields the sysfs path
/sys/bus/pci/devices/0000:00:1d.4/0000:02:00.0/
We will refer to this as
/sys/bus/pci/devices/<dgpu>/
later on.
The kernel can automatically turn the dGPU off when unused. This is referred to as runtime-PM or runtime-suspend and needs to be enabled explicitly.
First, let's look at why you should care about runtime PM.
Runtime PM can significantly reduce power consumption.
This does not only hold for the dGPU, but also for other devices, so consider setting up something like powertop
or custom udev
rules for that.
With respect to the dGPU, the following difference in power consumption can be observed without runtime PM:
dGPU state | Power draw (full device) |
---|---|
dGPU in D3cold (off) | 5W |
dGPU on without driver | 7W |
dGPU on with driver | 10+W |
In the worst case, the dGPU may double your power draw (and also get annoyingly warm) without you even using it.
Note that this testing is somewhat informal, and you can get better baseline results with further tuning of other devices.
The "dGPU on with driver" value represents the nvidia
driver loaded without power-management options enabled.
By setting appropriate power management options for the driver (discussed below), you can get values equivalent to "dGPU in D3cold".
For the device to actually enter runtime suspend, a couple of conditions need to be fulfilled:
- runtime suspend must be enabled for the device,
- the device must not be used, and
- if a driver is bound to the device, it must support runtime suspend, too.
Thus enabling runtime PM is not the only factor. Luckily for us, the nvidia driver has fairly decent runtime PM support, if properly configured.
You can enable runtime suspend by writing auto
to the device's power/control
file, i.e.
echo auto | sudo tee /sys/bus/pci/devices/<dgpu>/power/control
where <dgpu>
represents the bus, device, and function numbers of your dGPU (see above on how to find these).
Note that this has to be repeated each boot.
To automate this, you can rely on udev
or powertop
.
For powertop, see the respective documentation.
For udev, reboot after create a file /etc/udev/rules.d/80-nvidia-pm.rules
with the following:
# Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind
ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
# Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind
ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"
# Enable runtime PM for NVIDIA VGA/3D controller devices on adding device
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
If you do not wish to use the dGPU, it is easiest if you just blacklist all drivers by creating /etc/modprobe.d/dgpu.conf
with the following:
blacklist i2c_nvidia_gpu
blacklist nouveau
blacklist nvidia
blacklist nvidia-drm
blacklist nvidia-modeset
blacklist nvidia_uvm
alias i2c_nvidia_gpu off
alias nouveau off
alias nvidia off
alias nvidia-drm off
alias nvidia-modeset off
alias nvidia_uvm off
If you want to use the dGPU, create /etc/modprobe.d/dgpu.conf
with the following:
# Required for auto D3Cold
options nvidia_drm modeset=0
options nvidia NVreg_DynamicPowerManagement=0x02
Some form of this config may already exist if your system uses system76-power
.
Note, however, that you should not attempt to detach the clipboard if the dGPU is in use. Use the command nvidia-smi
to list any processes using the dGPU.
Additional options for each driver must be appended to a single "options" line. For example, to slightly speed up resuming from suspend for S0ix devices:
options nvidia NVreg_DynamicPowerManagement=0x02 NVreg_EnableS0ixPowerManagement=0x01
A full list of options can be found by running cat /proc/driver/nvidia/params | sort
.
Your desktop environment (or other software) may grab the dGPU and never release it. You can prevent wayland from hanging onto the nvidia driver by adding the following environment variables to /etc/environment
.
__EGL_VENDOR_LIBRARY_FILENAMES="/usr/share/glvnd/egl_vendor.d/50_mesa.json"
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/intel_icd.x86_64.json:/usr/share/vulkan/icd.d/intel_icd.i686.json
DXVK_FILTER_DEVICE_NAME="Intel"
VKD3D_FILTER_DEVICE_NAME="Intel"
__GLX_VENDOR_LIBRARY_NAME="mesa"
VDPAU_DRIVER=va_gl
CUDA_VISIBLE_DEVICES=null
Finally, make sure the systemd hooks relating to suspend/hibernate/resume are enabled- these allow the system to enter sleep mode even when the nvidia driver is in use. Also, ensure the nvidia persistence daemon is disabled- its purpose is to prevent devices from entering D3Cold.
systemctl enable nvidia-suspend.service nvidia-resume.service nvidia-hibernate.service nvidia-suspend-then-hibernate.service
systemctl disable --now nvidia-persistenced.service
To ensure that the dGPU is turned off when it's not in use, you can query the power_state
attribute of the device with surfacectl status
if installed, or without by running cat /sys/bus/pci/devices/<dgpu>/power_state
.
This will return the current power state of the device, which can be
-
D0
if in use -
D1
,D2
, orD3hot
if in a low-power state, or -
D3cold
if fully turned off.
Ideally, you want it to be in D3cold
when it's not in use. Again, you can run nvidia-smi
to list any processes using the dGPU.
There are a number of programs which help the user manage when a dGPU gets used, but nvidia explicitly discourages all of them except for switcheroo-control
.
Once installed, make sure the service is enabled: systemctl enable --now switcheroo-control.service
.
If you use Gnome, you can now right click programs and "run with dedicated gpu". Programs can be configured to always prefer the dGPU by their desktop entries, in which case the rightclick menu will show "run with internal gpu". Other DEs besides Gnome may support this as well.
Steam, for example, ships with a desktop file that prefers the dGPU. The relevant lines can be added to any desktop file, under section [Desktop Entry]
like so:
[Desktop Entry]
PrefersNonDefaultGPU=true
X-KDE-RunOnDiscreteGpu=true
Switcherooctl also has a cli: switcherooctl launch <program>
.
To use CUDA, unset CUDA_VISIBLE_DEVICES
in your shell.
Edit /etc/surface-dtx/detach.sh
and add a condition that checks /sys/bus/pci/devices/<dgpu>/power_state
for D3cold. The following example is specific to the Surface Book 3, be sure to validate the sysfs path for your device.
#!/usr/bin/env sh
# surface-dtx detachment handler
+# Abort if dGPU is in use
+read dgpu_state < /sys/bus/pci/devices/0000:00:1d.4/0000:02:00.0/power_state
+if [ $dgpu_state != D3cold ]; then
+ exit 1
+fi
+
# unmount all USB devices
for usb_dev in /dev/disk/by-id/usb-*
do
dev=$(readlink -f $usb_dev)
mount -l | grep -q "^$dev\s" && umount "$dev"
done
# signal commence
exit $EXIT_DETACH_COMMENCE
# The exit signal determines the continuation of the detachment-procedure. A
# value of EXIT_DETACH_COMMENCE (0/success), causes the detachment procedure
# to open the latch, while a value of EXIT_DETACH_ABORT (1, or any other
# non-zero value) will cause the detachment-procedure to be aborted. On an
# abort caused by this script, the detach_abort handler will _not_ be
# executed. It is therefore the the responsibility of this handler-executable
# to ensure the device state is properly reset to the state before its
# execution, if required.
Be sure the service is enabled by running systemctl enable --now surface-dtx-daemon.service
.
Nvidia's linux documentation can be found here https://download.nvidia.com/XFree86/Linux-x86_64/
Navigate to the readme for your driver version. At the time of wring, the latest readme is https://download.nvidia.com/XFree86/Linux-x86_64/575.64/README/