Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I fixed every single bug in wm #1334

Merged
merged 22 commits into from
Sep 7, 2024
Merged

I fixed every single bug in wm #1334

merged 22 commits into from
Sep 7, 2024

Conversation

yshui
Copy link
Owner

@yshui yshui commented Sep 7, 2024

.... or at least the ones AFL++/libFuzzer can find. Some of these are very tricky, and my brain hurts.

This all came from @Monsterovich reporting an assertion failure in wm. And I thought, the wm code is nicely isolated now, maybe I can fuzz it, see if there are more bugs... and boy are there more 😮‍💨

Unfortunately some of these changes are big, so I think this warrants an rc4.

Copy link

codecov bot commented Sep 7, 2024

Codecov Report

Attention: Patch coverage is 88.44622% with 29 lines in your changes missing coverage. Please review.

Project coverage is 53.77%. Comparing base (c6fc045) to head (570381f).
Report is 23 commits behind head on next.

Files with missing lines Patch % Lines
src/wm/wm.c 84.25% 17 Missing ⚠️
src/x.c 94.84% 5 Missing ⚠️
src/picom.c 63.63% 4 Missing ⚠️
src/event.c 83.33% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             next    #1334      +/-   ##
==========================================
- Coverage   53.90%   53.77%   -0.13%     
==========================================
  Files          70       70              
  Lines       15165    15195      +30     
==========================================
- Hits         8174     8171       -3     
- Misses       6991     7024      +33     
Files with missing lines Coverage Δ
src/atom.c 50.00% <ø> (ø)
src/region.h 56.25% <100.00%> (+1.04%) ⬆️
src/wm/tree.c 88.94% <100.00%> (-1.50%) ⬇️
src/wm/win.c 75.12% <100.00%> (-0.23%) ⬇️
src/wm/win.h 93.05% <ø> (ø)
src/wm/wm.h 100.00% <ø> (ø)
src/wm/wm_internal.h 100.00% <ø> (ø)
src/x.h 89.18% <100.00%> (+6.83%) ⬆️
src/event.c 66.82% <83.33%> (+1.82%) ⬆️
src/picom.c 61.18% <63.63%> (-3.23%) ⬇️
... and 2 more

... and 4 files with indirect coverage changes

@yshui yshui force-pushed the wm-fixes branch 3 times, most recently from 8858de1 to b384889 Compare September 7, 2024 08:48
To make it possible to stub them out.

These are the only requests wm.c makes to the X server, we will be able
to test the wm code on its own if we can provide mocked version of these
functions.

Signed-off-by: Yuxuan Shui <[email protected]>
If a new window reuses the ID of a previously destroyed window, we would
get confused and will not send a query tree request for this new window,
thus misses all of its children. This is because we saved destroyed
window as orphans, and we assume all orphans are either imported or in
the process of being imported, which is not true for destroyed windows.

With this commit we no longer save destroyed windows. We did that
to gracefully handle a window being destroyed with a query tree request
still in-flight. Now we handle it the more obvious way - we mark any
in-flight query trees for omission when a window is destroyed.

(Now, event if we don't save explicitly destroyed windows anymore, some
orphans might get destroyed and have their ID reused without us knowing.
This is because we don't get DestroyNotify for orphans. To catch this
case, we pessimistically re-query orphans that don't have children.)

Even if the window ID is reused after by a newly created window and the
query tree actually succeeds with information from the new window, we
can't use that result anyway, because at the point when the query tree
is answered, we wouldn't have set up the event mask for the new window
yet.

Signed-off-by: Yuxuan Shui <[email protected]>
Before, if we got a reparent event for a previously unknown window, we
will ignore that event because we expect this window will later be found
in a query tree reply. This is not always true.

If a previously unknown window is reparented to a window that is already
fully imported, because we won't send query tree for a fully imported
window, this new will just never be imported.

We will now explicitly initiate the import process in this case. This
also applies to new window reusing the ID of a previously destroyed
window in our orphan tree.

Signed-off-by: Yuxuan Shui <[email protected]>
`node` can legitimately be NULL inside wm_handle_get_wm_state_reply.

Signed-off-by: Yuxuan Shui <[email protected]>
Consider this scenario:

1. Window created with ID A, which generates a CreateNotify for A.
2. Window with ID A is destroyed.
3. Another window is created, reusing ID A.
4. We receive the CreateNotify for A.

At step 4, we will be querying a different window than what we think we
were querying. It might have a different parent from the one in the
CreateNotify, and we should also ignore the destroy event in (2), as it
wasn't for the window we are querying.

To fix that, when we initiate the import process for a window, we no
longer attach it to its declared parent, instead we orphan it. We also
ignore any destroy event for it while the query tree is in-flight. We
let the query tree reply event decide: 1. what's the window's parent, 2.
whether the window still exists.

Signed-off-by: Yuxuan Shui <[email protected]>
Turns out inferring a window's life cycle based on DestroyNotify sent by
its parent is incredibly hard. Especially if the window is at any point
reparented to a window we haven't imported yet, because we won't be
receiving any DestroyNotify for it. And we also can't reliably know if
such reparenting has happened.

So we now set the StructureNotifyMask, which will give us DestroyNotify
sent from the window itself. With this the tracking window life cycle
from the point when the event mask is set becomes easy, and we always
know if a window is destroyed or not.

Setting StructureNotifyMask also means we will get copies of
Circulate/Configure/Map/Unmap/ReparentNotify from the window itself, we
filter this so existing code doesn't have to change. Only for DestroyNotify
we favor the one sent from the window itself.

Also, setting event masks on window MUST be done synchronously. We use
to just send out a ChangeWindowAttributes request and ignore its result.
But it is actually possible, though extremely unlikely, for the
ChangeWindowAttributes to fail but the QueryTree later to succeed, if
the window is destroyed and another created (during the time between we
send these two requests) with the same ID. In that case we would thought
we have set up the event mask, but actually didn't. Which would be
detrimental.

Signed-off-by: Yuxuan Shui <[email protected]>
Well, we tried to use query tree replies to attach windows to correct
parents. But that's not good enough. Because the parent can also
actively acquire children via reparent events. The result is windows
fighting each other for control of their parent pointer.

And this is also bad because in we don't know the stacking order of a
window from its query tree reply.

The conclusion is, a window must be fully responsible for updating its
own children list. Because of ABA problem caused by window ID reuse,
it's possible for a window to be attached to a wrong parent temporarily.
This also means receiving a DestroyNotify from a window X about one of its
children with ID Y, doesn't mean the window with ID Y is actually
destroyed, if Y is currently attached to the wrong parent. From that
DestroyNotify, the only thing we can deduce is that X lost a child with
ID Y. Which means events must be handled in two steps:

If we get a DestroyNotify from window X about its child Y, we disconnect
Y from X. If we get a DestroyNotify from window Y for itself, only then
can we really destroy Y.

Signed-off-by: Yuxuan Shui <[email protected]>
We could be getting reparent events for windows we already found out to
be non-existent by trying to set its event mask.

Signed-off-by: Yuxuan Shui <[email protected]>
It is possible for a window in the process of being queried to have an
incomplete children list, in which case wm_disconnect might see
child/parent mismatches.

Signed-off-by: Yuxuan Shui <[email protected]>
Orphan windows were once imported, so should have their event masks set,
which means we should be getting DestroyNotify when they are destroyed.
That should be enough to guarantee that orphans will be destroyed
eventually, we don't need to reap them actively.

Signed-off-by: Yuxuan Shui <[email protected]>
The first thing we do when we get a new toplevel, is sending a
GetWindowAttributes request, and based on its reply, we may allocate a
managed window struct for it. Currently we find which tree node to
attach this managed window struct by its X window ID only, but we should
use the full treeid.

When a new toplevel is added, we queue a creation tree change for it
which contains a pointer to the tree node. If the tree node is destroyed
before the creation tree change is dequeued, we must "cancel" that
creation tree change, because otherwise the pointer in it would be
dangling. This also means there will be no kill tree change for it
either. As a consequence, we have no way to tell anybody about its
zombie node if we create one. So we just don't. If we don't create a
zombie node for a destroyed node, there will be nowhere to put the
managed window struct attached to it if there is one. Which means there
cannot be a one.

Summing it all up - a toplevel cannot have a managed window struct, if
its creation tree change was never dequeued. You can imagine the
creation tree change is handing out a "key" to the tree node, and this
tree node can only have managed window attached to it via this key.

However, if we are getting the tree node via its X window ID only, we
bypass this "key" mechanism, essentially we will find tree node with the
same X ID even if we don't have the key for it yet. This happens if the
window was destroyed, then another window took its ID while we were
waiting for the GetWindowAttributes reply, we shouldn't start managing
it.

Signed-off-by: Yuxuan Shui <[email protected]>
If a node is removed from toplevel then comes back, it will generate a
new creation tree change. Because of the assumption explained in the
previous commit, we can only attach managed window struct to it using
the "key" from the creation tree change. But the node's key hasn't
changed! Since it was toplevel, it will have a previous, older creation
tree change which also handed out this key, which can be used to attach
managed window to it, thus breaks the assumption.

This means every time a node becomes a toplevel, it must have a
different key. We bump its gen number when it's removed from toplevel to
ensure that.

Signed-off-by: Yuxuan Shui <[email protected]>
Signed-off-by: Yuxuan Shui <[email protected]>
We care about whether setting the event masks succeeded, we can't
deduced that from the query tree request because of race conditions and
window ID reuse. So we have to check if the request for setting the
event masks itself succeeded or not. But `xcb_request_check` is out of
order w.r.t. to other X events/replies. So essentially we are "looking
into the future" with it, sensing that a window no longer exists before
the usual events and replies we received from X would have told us so.
This breaks assumptions we based many of our assertions on.

For example, in many places when we can't find a window in our tree, we
assume we must have an incomplete tree (i.e. based on the events/replies,
there are windows on X's side we haven't queried). But it's actually
possible that we looked into the future and saw it no longer exists.

We have updated the async request mechanism so it can support requests
that don't expect replies. This commit uses that to make setting event
masks async, so it no longer uses `xcb_request_check` and is no longer
out of order.

Also updated how query tree requests are handled to make it work
together with this change.

Signed-off-by: Yuxuan Shui <[email protected]>
If a toplevel loses toplevel status, its "client window" will no longer
be updated. If the client window is then destroyed, it becomes a
dangling pointer.

Signed-off-by: Yuxuan Shui <[email protected]>
Previously we would just ignore these updates. But `wm_tree_find_client`
does treat a toplevel with WM_STATE set as its own client window. So to
keep things consistent, client window refresh is still needed in this
case.

Signed-off-by: Yuxuan Shui <[email protected]>
This is already accounted for by wm_tree_find_toplevel_for, but in
wm_tree_set_wm_state, there is a bit of copy-and-pasted code that
didn't.

Signed-off-by: Yuxuan Shui <[email protected]>
Since we know have split reparenting into 2 steps, the first of which
will create an orphaned window, we now can have orphaned windows when
there are no query trees in-flight.

Update wm_is_consistent condition to account for that.

Signed-off-by: Yuxuan Shui <[email protected]>
Pass tree node instead of window ID.

Signed-off-by: Yuxuan Shui <[email protected]>
See comments for proof why it's not reachable.

Signed-off-by: Yuxuan Shui <[email protected]>
@yshui yshui merged commit e3581fc into next Sep 7, 2024
22 checks passed
@yshui yshui deleted the wm-fixes branch September 7, 2024 22:56
@yshui yshui restored the wm-fixes branch September 7, 2024 22:56
@yshui yshui deleted the wm-fixes branch September 7, 2024 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant