Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAJOR] KDE Plasma Wayland & X11 poor performance & frame drops when opening apps #538

Closed
2 tasks done
kodatarule opened this issue Jul 26, 2023 · 51 comments
Closed
2 tasks done
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate

Comments

@kodatarule
Copy link

kodatarule commented Jul 26, 2023

NVIDIA Open GPU Kernel Modules Version

535.86.05

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

EndeavourOS Linux

Kernel Release

6.4.6-zen

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

RTX 3090

Describe the bug

When opening apps or just trying to screen record, in general anything which demands more from the GPU it starts losing frames, hitches and lags. This doesn't occur on the proprietary driver

To Reproduce

Load into KDE Plasma wayland and open any app(dolphin,browser, etc)

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

@kodatarule kodatarule added the bug Something isn't working label Jul 26, 2023
@kodatarule
Copy link
Author

Just wanted to make slight update, this also occurs on X11 as well

@kodatarule kodatarule changed the title KDE Plasma Wayland poor performance & frame drops when opening apps KDE Plasma Wayland & X11 poor performance & frame drops when opening apps Sep 13, 2023
@kodatarule
Copy link
Author

With the news of driver 560 defaulting to the open kernel modules, I decided to give this a second try and this issue is still present on both X11 and wayland.

Operating System: EndeavourOS
KDE Plasma Version: 6.0.4
KDE Frameworks Version: 6.1.0
Qt Version: 6.7.0
Kernel Version: 6.8.9-zen1-1-zen (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5800X3D 8-Core Processor
Memory: 31,2 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 3090/PCIe/SSE2

@kodatarule kodatarule changed the title KDE Plasma Wayland & X11 poor performance & frame drops when opening apps [MAJOR] KDE Plasma Wayland & X11 poor performance & frame drops when opening apps May 11, 2024
@kodatarule
Copy link
Author

Just to update for people that would come here:

EDIT: The solution was to add NVreg_EnableGpuFirmware=0 to the kernel load and all issues were fixed!

@edisionnano
Copy link

Just to update for people that would come here:

EDIT: The solution was to add NVreg_EnableGpuFirmware=0 to the kernel load and all issues were fixed!

This won't work with the open modules, only the closed ones, since the open modules require gsp. Right?

@kodatarule
Copy link
Author

I believe so that would be the case, this seems to only work on proprietary.

@aritger
Copy link
Collaborator

aritger commented May 22, 2024

It would be worth retesting this case with the new 555.42.02 driver:
https://www.nvidia.com/Download/driverResults.aspx/224751/en-us/

We made several improvements to graphics performance that will help both the proprietary kernel modules with NVreg_EnableGpuFirmware=1, and the open kernel modules.

@mtijanic
Copy link
Collaborator

Tracked internally as bug 4662986.

@mtijanic mtijanic added the NV-Triaged An NVBug has been created for dev to investigate label May 22, 2024
@mtijanic
Copy link
Collaborator

Hi @kodatarule , can I trouble you for two experiments? With the 555.42 driver - Proprietary*, but without NVreg_EnableGpuFirmware=0 (or set it to 1), please try:

(1) Disabling MangoHUD and any other background profiling apps you might have and see if it gets any better.
(2) Wait until you see the issue, and then run nvidia-bug-report.sh as soon as you can.

This is so we can get the bug report snapshot soon after a bad state and we know where to look at it, timescale-wise.

Thanks in advance!

* Open is also fine, but then please run with NVreg_RmMsg=":" and also run dmesg -w > dmesg.txt on the side and attach that file too.

@kodatarule
Copy link
Author

Hi, I did try with proprietary beta 555.42 with the GPU firmware enabled and have generated a log.
Just to update I tried both with mangohud on/off globally which didn't make any change at all.
nvidia-bug-report.log.gz

@aritger
Copy link
Collaborator

aritger commented May 25, 2024

To help better isolate this, looking more carefully at your xorg.conf:

Option         "nvidiaXineramaInfoOrder" "DFP-3"
Option         "metamodes" "DP-2: 2560x1440_165 +0+0 {ForceFullCompositionPipeline=On}, DP-0: 2560x1440_165 +2560+0 {ForceFullCompositionPipeline=On}"
Option         "UseNvKmsCompositionPipeline" "false"

Do you see the same performance problems:
(a) if you remove the UseNvKmsCompositionPipeline option
(b) if you remove the {ForceFullCompositionPipeline=On} parts
(c) if you use slightly lower refresh rates? (I assume 2560x1440_165 is running at 165 Hz)

@kodatarule
Copy link
Author

Hello,
The option for UseNvKmsCompositionPipeline if removed would create even bigger stutters.
ForceComp/ForceFullComp On or Off didn't make any difference, on a side note these options make 0 difference on wayland which is affected either way.
Changing refresh rates had no impact on this, it feels like it does weird clocks with GSP Firmware(driver 555 proprietary and all open source drivers prior to this).
I'm not sure what could be causing this problem, but I also noticed a lot of reports of people on the nvidia forums as well that GSP does trigger this same behavior for their systems.

@mtijanic
Copy link
Collaborator

mtijanic commented Jun 6, 2024

Update: We've found two possible causes of stutter. Or rather, we found two issues that definitely cause stutter on some configurations, but we still don't have a good idea of how widespread either of them is.

I have published patches that eliminate one and log the other here: #658

I'd love it if folks that are experiencing these issues would give it a try and report back. Getting a good idea of the impact would help us prioritize getting these in. Many thanks in advance!

@ptr1337
Copy link

ptr1337 commented Jun 7, 2024

@mtijanic
Thanks for the patchset! I have patched it for nvidia-open-dkms and pushed it to the testing repository on CachyOS. Users got notified for testing this.
Sadly, I can not reproduce this on 40xx GPU's.

@ptr1337
Copy link

ptr1337 commented Jun 8, 2024

Actually, sometimes when doing a screenshot with spectacle, im seeing some little fps drops on the patched nvidia-open-dkms module.
This was not present on the closed one, but im not sure if this is fully related.

nvidia-bug-report.log.gz

@Virkkunen
Copy link

Update: We've found two possible causes of stutter. Or rather, we found two issues that definitely cause stutter on some configurations, but we still don't have a good idea of how widespread either of them is.

I have published patches that eliminate one and log the other here: #658

I'd love it if folks that are experiencing these issues would give it a try and report back. Getting a good idea of the impact would help us prioritize getting these in. Many thanks in advance!

What would be the proper process of building and installing this patchset? I'm facing these issues on the open-beta-dkms and I'd like to help troubleshoot with my logs

@mtijanic
Copy link
Collaborator

What would be the proper process of building and installing this patchset? I'm facing these issues on the open-beta-dkms and I'd like to help troubleshoot with my logs

First, make sure you have regular 555.52.04 driver installed in whatever way you do it normally (distro package, .run file, etc).
Then, clone my branch with;

 git clone --single-branch --branch 555-testing-patches https://github.com/mtijanic/open-gpu-kernel-modules.git 555-testing

Then, build it:

 cd 555-testing && make -j16

If successful, it will produce a file kernel-open/nvidia.ko (and many others not relevant here). Check if it exists. Now, you just need to switch to using this instead of your installed nvidia.ko. To find out where it is, you can run

$ modinfo nvidia | grep filename
filename:       /lib/modules/5.15.0-105-generic/kernel/drivers/video/nvidia.ko

Easiest would be to just backup the original file, and replace it with the newly built one:

cd /lib/modules/5.15.0-105-generic/kernel/drivers/video/
sudo mv nvidia.ko nvidia.ko.backup
sudo cp /path/to/555-testing/kernel-open/nvidia.ko .

Or use symlinks.

You'll need to reload the driver for the change to take effect. A system reboot would do it, but also killing X / your DE and then rmmod would work too. For example:

sudo service lightdm stop # or gdm, etc
sudo rmmod nvidia_uvm nvidia_vgpu_vfio nvidia_drm nvidia_modeset nvidia
sudo service lightdm start

To revert, just restore the original backed up file.

@ptr1337
Copy link

ptr1337 commented Jun 11, 2024

@Virkkunen
If you are on archlinux, you can also use following PKGBUILD:
https://github.com/CachyOS/CachyOS-PKGBUILDS/blob/master/nvidia/nvidia-open-dkms/PKGBUILD

@mtijanic
Ive tested this now for around one week and still having here and there stutters, mainly at screenshots or minimizing windows.

@Virkkunen
Copy link

Using @ptr1337 PKGBUILD (on endeavour) I was able to install this patch. So far it seems that the stutter while opening, closing and minimising apps, and screen recording (with spectacle) is gone.

However, when moving the cursor I can notice some stutters. Moving quickly in a circle it becomes more apparent, with visible gaps in the circle, like it's skipping some positions. I tried to record a slow motion video of this but it's quite a finnicky thing to visualise in a recording.

20240611_193302.mp4

nvidia-bug-report.log.gz

@xpander69
Copy link

xpander69 commented Jun 17, 2024

Ok built the open modules with the patches and so far it seems the stutter issues have been fixed!
i reported this problem on the nvidia forums for closed modules before. First time using open ones.
RTX 3080 555.52.04, 6.9.3-cachyos kernel, Arch Linux, MATE Desktop, X11

edit: OK theres still very minor input related (mouse) stutter now like few periodic frametime spikes..which doesn't happen with closed modules and gsp disabled.

overall seems to be huge improvement, but not yet ideal.

@kodatarule
Copy link
Author

After testing out the open modules with the patches, the situation has improved somewhat, but the hitches when opening apps or moving the cursor are still present.
Attached is a bug report.

nvidia-bug-report.log.gz

@ptr1337
Copy link

ptr1337 commented Jun 27, 2024

@mtijanic I have just updated to the stable 555.58 driver (closed one), enabled the GSP Firmware but these stutters are still present.

Ive noticed, the PR from you got merged.
Here a video, where its mainly visible on doing a screenshot with spectacle.

https://github.com/NVIDIA/open-gpu-kernel-modules/assets/70081076/7e33f71c-4b6c-4def-b020-85644d96646b
nvidia-bug-report.log.gz

@mtijanic
Copy link
Collaborator

mtijanic commented Jul 1, 2024

Follow-up on this:

Update: We've found two possible causes of stutter. Or rather, we found two issues that definitely cause stutter on some configurations, but we still don't have a good idea of how widespread either of them is.

In 555.58.02 (but not 555.58 from last week) we fixed the bigger of the two causes. Particularly those using kwin should give this a try and report back. 555.58.02 does not include 674c009 which fixes a different, less frequent cause. You can still apply this commit manually if using the Open modules, and it will be included in 560.xx.

Please test and report back! ❤️

@ptr1337
Copy link

ptr1337 commented Jul 1, 2024

@mtijanic Desktop generally runs fine, the only problem, which im still seeing (with 674c009 and also without) that spectacle is sometimes "laggy" and just jumps, like you see above.

I made you a fresh video and nvidia-bugreport.sh, see below.

nvidia.mp4

nvidia-bug-report.log.gz

Edit:

I will test further with the closed source driver + GSP enabled.

@mtijanic
Copy link
Collaborator

mtijanic commented Jul 1, 2024

I will test further with the closed source driver + GSP enabled.

Please! Closed source and GSP ON vs OFF will give us the best info to triage further.

Thanks a ton, for all the reports you've sent in so far! We might not get a chance to meaningfully reply to them all, but we do really appreciate it.

@ptr1337
Copy link

ptr1337 commented Jul 1, 2024

I will test further with the closed source driver + GSP enabled.

Please! Closed source and GSP ON vs OFF will give us the best info to triage further.

Thanks a ton, for all the reports you've sent in so far! We might not get a chance to meaningfully reply to them all, but we do really appreciate it.

Retsted with the closed source driver with GSP on and off.
The issue also appears when I have the GSP Firmware enabled.

Here are comparison:

GSP ON:

nvidia.mp4

nvidia-bug-report.log.gz

GSP Off:

nvidia-gsp-off.mp4

nvidia-bug-report.log.gz

Edit:
It definitly improved compared to without the patches, but mainly at spectacle I still see these hiccups.

@kodatarule
Copy link
Author

With 555.58.02 it has definitely improved a lot, however I still notice a few hiccups here and there.

nvidia-bug-report.log.gz

@urbenlegend
Copy link

I just tested with 555.58.02 with GSP off and on and I am still seeing weird judders and hitches simply dragging KDE's Dolphin file manager around on the desktop whenever GSP is enabled. When it is off, the window motion is very smooth.

The issue seems to come and go. With GSP, the first few window moves will be smooth, but continuously moving the window around will cause hitching. Without GSP, it is smooth the entire time.

@omnigenous
Copy link

Where exactly do I add NVreg_EnableGpuFirmware=0 to the kernel load on arch linux?

@MishaProductions
Copy link

MishaProductions commented Jul 9, 2024

Add nvidia.NVreg_EnableGpuFirmware=0 to the variable GRUB_CMDLINE_LINUX_DEFAULT in the /etc/default/grub file, and run grub-mkconfig -o /boot/grub/grub.cfg

@urbenlegend
Copy link

@omnigenous In addition to what MishaProductions said, make sure to prepend the module name to that option, so like nvidia.NVreg_EnableGpuFirmware=0

@zoobporsor
Copy link

zoobporsor commented Jul 16, 2024

after I installed Nvidia on arch, KDE was super laggy and choppy. when I applied nvidia.NVreg_EnableGpuFirmware=0 to my kernel parameters via grub, it fixed it now it is smooth. I use RTX 2080ti

@clapbr
Copy link

clapbr commented Jul 23, 2024

560 beta out claims to improve this, still bad on my 3090 though

@SeongGino
Copy link

Oh, so this is where my problem was?

560 beta, 3060ti, Linux 6.9, Plasma 6.1.3 on Arch.
On the Wayland session it starts smooth, but just a few seconds after starting it was bad enough that Plasmashell would freeze at what felt like randomly regular intervals. Disabling the GSP as a kernel param made Plasma's Wayland session perfectly smooth.

@mtijanic
Copy link
Collaborator

Hey @SeongGino can you check if you have coolercontrol program running? It's a known cause of this stutter since the way it queries the data is by starting and killing an nvidia-smi process all the time. This startup (and especially shutdown) talks to GSP and can stall out some other things.

I believe they've fixed this and switched to NVML, but there's still no release that picked up that patch. See https://gitlab.com/coolercontrol/coolercontrol/-/issues/288

In the meantime, we'll look into ways to make this shutdown less impactful so we don't depend on patching all third party tools.

@urbenlegend
Copy link

Nvidia 560 with GSP on seems to have lessened the frame drops but it is still not on par with GSP off. If I drag a window around (say KDE Dolphin) for an extended period of time, like greater than 5 seconds, it will start stuttering again. It is much less pronounced than before, but still not the perfectly smooth action that you get with GSP off.

@SeongGino
Copy link

SeongGino commented Jul 25, 2024

Hey @SeongGino can you check if you have coolercontrol program running? It's a known cause of this stutter since the way it queries the data is by starting and killing an nvidia-smi process all the time. This startup (and especially shutdown) talks to GSP and can stall out some other things.

I believe they've fixed this and switched to NVML, but there's still no release that picked up that patch. See https://gitlab.com/coolercontrol/coolercontrol/-/issues/288

In the meantime, we'll look into ways to make this shutdown less impactful so we don't depend on patching all third party tools.

I've never heard of or used this coolercontrol in my life.

But, I do have Plasma System Monitor applets running, one of them set to track GPU Usage stats. Enabling the GSP and removing the GPU monitor widget did seem to resolve the stutter for me in the Wayland session.

As far as I can tell, anyways--dragging a window around like Dolphin doesn't seem to be exhibiting the same hitching.

@mtijanic
Copy link
Collaborator

Thanks @SeongGino! Do you know what exact applet this is? Please keep in mind I'm not at all familiar with KDE and its family of tools, so dumb it down for me :)

I found this https://github.com/lestofante/ksysguard-gpu which already has a an issue open for this.

@SeongGino
Copy link

SeongGino commented Jul 25, 2024

@mtijanic It's not an external component like what you've linked; it looks like it's part of the stock Plasma desktop widgets--or if it is extra, it most likely comes with KSysGuard.

2024_07-25 130818
2024_07-25 130319-CPU Usage Settings

@mtijanic
Copy link
Collaborator

Thanks! If I'm reading it correctly, the relevant code is at https://invent.kde.org/plasma/libksysguard/-/blob/master/processcore/plugins/nvidia/nvidia.cpp?ref_type=heads and it indeed spawns an nvidia-smi process every time to get the data. This should ideally move to using libnvidia-ml.so, but this is the first time I'm seeing this code so I can't say how easy that is to do.

@SeongGino
Copy link

I see! Well, I posted an issue on KDE's bugtracker linking back to this issue, so hopefully there will be some response.

@Kimiblock
Copy link

560 reduces the stutter, but it is nowhere near Proprietary + GSP Off.

If I scroll in Firefox after the desktop sits idle for some time, it'll lag for seconds.

@Kimiblock
Copy link

Also, GNOME suffers from this issue a lot, especially when opening the Overview. It stutters almost every time even if triple buffering is enabled.

@Kimiblock
Copy link

If I put something demanding running on the GPU, the performance level jumps to P0 and GNOME is smooth again. Maybe pinning the perf level can bypass this.

@bugQ
Copy link

bugQ commented Aug 7, 2024

I see! Well, I posted an issue on KDE's bugtracker linking back to this issue, so hopefully there will be some response.

looks like you got your response, @mtijanic:

The problem with using the suggested library is that the headers are in a proprietary SDK that cannot be freely distributed, which means that it would make the NVidia GPU integration practically unbuildable on most machines. Even if we were to include the header in ksystemstats (which its license doesn't actually allow, but I see some projects do) we'd still be stuck since the library itself is bundled in the driver and that is generally also not installed on build machines.

So ultimately, running nvidia-smi is pretty much the only way we can support this without introducing a nasty build system issue. And frankly, it seems to me that it's an upstream issue anyway? Running nvidia-smi shouldn't have such an impact in the first place?

— Arjen Hiemstra 2024-08-06 11:03:41 UTC

@mtijanic
Copy link
Collaborator

NVIDIA bug 4804613 filed to track stutter with ksysguard (and nvidia-smi pmon in general)

Kimiblock added a commit to Kimiblock/moeOS.config that referenced this issue Aug 18, 2024
@Doaxan
Copy link

Doaxan commented Sep 11, 2024

Changed amd to nvidia for the sake of al pieces. Animation is very slow, it is visible even on ISO images of any distribution, cachy or endeavouros, it is very annoying on 3090, I hope that the problem will be solved soon

@MrEAlderson
Copy link

MrEAlderson commented Sep 13, 2024

Same issue with latest Fedora dnf upgrade on a fresh installation, v560 stable and 3070. It's unbearable, please fix

@Eplankton
Copy link

Same issue with latest Fedora dnf upgrade on a fresh installation, v560 stable and 3070. It's unbearable, please fix

Same in 550 release from KDE Plasma 5.27, Kubuntu 24.04.1, the frame will drop to as low as 30fps when open any application.

@SeongGino
Copy link

I'm starting to think there's enough "me too"'s in this thread that the point has been made, and it probably won't let the issue get fixed faster. :|

@mtijanic
Copy link
Collaborator

Thanks! We've fixed several different stutter issues since this was opened, but it is really hard to keep track of what is still pending on which configuration. I think when 565 release comes I will close this issue and we can open new ones for anything that still manifests with that driver version and sufficiently recent 3rd party userspace (since some fixes were not in the driver itself).

@mtijanic
Copy link
Collaborator

Per above, I'm going to close this issue now since it is already tracking mulitple things that were fixed at various points. If there are further stutter issues present in 565.xx, please open a new issue.

Please also note that #693 is still open and that tracks the stutter/choppiness when resuming from idle. This is a different issue from stutter seen in games or when moving windows around and similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate
Projects
None yet
Development

No branches or pull requests