Skip to content

kms/surface: Call cleanup_texture_cache for each device at end of draw #1442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 2, 2025

Conversation

ids1024
Copy link
Member

@ids1024 ids1024 commented May 28, 2025

Fixes an issue where a dual GPU system would keep allocating dGPU PBOs in cpu_copy() every frame, but cleanup() was not being called, since the surface thread for the builtin output was rendered on the primary/integrated GPU, targeting the same GPU.

Even if a different monitor was compositing on the dGPU, That wouldn't help since that thread has it's own GpuManager with it's own renderer and cache.

Running cleanup at the end (or start) of each frame seems like a good idea. Not sure if it would be best to avoid additional calls, or if that's desirable/fine.

@ids1024 ids1024 requested a review from Drakulix May 28, 2025 18:34
@ids1024
Copy link
Member Author

ids1024 commented May 28, 2025

Hm, I seem to be seeing some issues, seemingly related to this, with Intel surfaces on the Nvidia output suddenly turning black?

I don't think it should be invalid to call cleanup_texture_cache() at any arbitrary point, so this suggests problems somewhere else...

@ids1024
Copy link
Member Author

ids1024 commented May 29, 2025

Looks like most of the issues here are from the glDeleteSync in smithay? With that commented, I see fewer issues.

As noted in https://registry.khronos.org/OpenGL-Refpages/es3.0/html/glDeleteSync.xhtml, it should be "flagged for deletion and will be deleted when it is no longer associated with any fence command and is no longer blocking any glWaitSync or glClientWaitSync command", so I don't think this should cause any observable behavior change...

@ids1024
Copy link
Member Author

ids1024 commented May 29, 2025

I guess it's not too much of a surprise if our code for synchronizing reads/writes between contexts still isn't quite right.

It seems the gl.FenceSync call creating the read_sync is helping to synchronize something even when nothing waits on it, which is changed if the fence is deleted with gl.DeleteSync.

@Drakulix
Copy link
Member

I don't think it should be invalid to call cleanup_texture_cache() at any arbitrary point, so this suggests problems somewhere else...

No absolutely not, the api is intended exactly for this use-case. So there has to be something wrong with our (admittedly very new) GL synchronization code.

@ids1024
Copy link
Member Author

ids1024 commented May 30, 2025

Looks like it may help to change wait_for_syncpoint() to not use ClientWaitSync, and just always call WaitSync (And not call sync.take()?

Though synchronization issues aren't consistent, so that isn't necessary the correct fix. It might make sense if glClientWaitSync isn't sufficient for synchronizing VRAM access between different contexts, but I don't see any clear documentation one way or another.

In general, reading through the synchronization code, it looks like things should be synchronized properly. I still don't see obvious issues.

@Drakulix
Copy link
Member

Looks like it may help to change wait_for_syncpoint() to not use ClientWaitSync, and just always call WaitSync (And not call sync.take()?

That might break some multi-gpu use cases, where I am pretty sure we can end up needing to synchronize on the CPU.

Though synchronization issues aren't consistent, so that isn't necessary the correct fix. It might make sense if glClientWaitSync isn't sufficient for synchronizing VRAM access between different contexts, but I don't see any clear documentation one way or another.

In general it should. It is like expecting glFinish to not implicitly do glFlush.

@ids1024
Copy link
Member Author

ids1024 commented May 30, 2025

https://registry.khronos.org/OpenGL/specs/es/3.0/es_spec_3.0.pdf

D.3 Propagating Changes to Objects
[...]
Rule 4 If the contents of an object T are changed in a context other than the cur-
rent context, T must be attached or re-attached to at least one binding point in the
current context, or at least one attachment point of a currently bound container
object C, in order to guarantee that the new contents of T are visible in the current
context.

That does indeed make it sound like waiting for the work on one context to be done with glClientWaitSync, then binding the texture should be sufficient to propagate changes from one context to another.

So yeah, the synchronization does all look right.

Fixes an issue where a dual GPU system would keep allocating dGPU PBOs in
`cpu_copy()` every frame, but `cleanup()` was not being called, since the
surface thread for the builtin output was rendered on the
primary/integrated GPU, targeting the same GPU.

Even if a different monitor was compositing on the dGPU, That wouldn't
help since that thread has it's own `GpuManager` with it's own renderer
and cache.

Running cleanup at the end (or start) of each frame seems like a good
idea. Not sure if it would be best to avoid additional calls, or if
that's desirable/fine.
@ids1024
Copy link
Member Author

ids1024 commented Jun 2, 2025

As far as I can tell, Smithay/smithay#1748 fixes the issue I've been seeing here. If people have been seeing similar behavior in other circumstances, it may fix that too.

@ids1024 ids1024 force-pushed the cleanup-cache_noble branch from c8b927d to 31ff4df Compare June 2, 2025 16:38
@ids1024
Copy link
Member Author

ids1024 commented Jun 2, 2025

Added a commit to update smithay with that fix.

@Drakulix Drakulix merged commit 4fa3113 into master Jun 2, 2025
6 checks passed
@Drakulix Drakulix deleted the cleanup-cache_noble branch June 2, 2025 16:58
@Drakulix Drakulix mentioned this pull request Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants