`test_processbus.py::TestBusMethod::test_wait` flakily fails with `START_ERROR` because of `EINVAL`

### Huh?

The test in question is unstable and it's mostly visible in Windows runs but I think I also saw similar Ubuntu failures.

Example test failure: https://github.com/cherrypy/magicbus/runs/2114870045#step:12:106

The log appears to be:
```python-traceback
Bus state: ENTER
Bus state: IDLE
Bus state: START
Bus state: START_ERROR
Exiting due to error in start listener:
Traceback (most recent call last):
  File "D:\a\magicbus\magicbus\magicbus\base.py", line 205, in _transition
    os.write(write_fd, b'1')
OSError: [Errno 22] Invalid argument
Bus state: STOP
```

### Pointers

`magicbus.base.Bus.wait()` creates a pipe for getting the transition notifications more efficiently and adds it to the instance attribute `_state_transition_pipes`: https://github.com/cherrypy/magicbus/blob/57d151e/magicbus/base.py#L315-L334
This only happens if the runtime is `select`-compatible.
It then enters a loop waiting for either the `select.select()` to get a notification about any transition change or just sleeps for a while otherwise.

`magicbus.base.Bus.transition()` (through `magicbus.base.Bus._transition()`) writes a byte into every pipe in `_state_transition_pipes` attempting to notify the waiters before their timeouts expire: https://github.com/cherrypy/magicbus/blob/57d151e/magicbus/base.py#L202-L205

Performing the transitions in separate threads may create a race condition when `list(self._state_transition_pipes)` got a copy of the pipes, then during the iteration, some pipe file descriptors get closed and removed by `wait()` asynchronously between that copy and the loop iteration attempting to write into the previously copied descriptors.
This probably means that the hack of avoiding the "Set changed size during iteration" is not clever enough and we need to suppress the error that happens there.
One thing that is surprising to me is that I cannot reproduce this behavior under GNU/Linux on my machine — it always returns other errors (like "Broken pipe" if the reader end of the pipe has been closed before writing or "Bad file descriptor" if the write fd has been closed). Does Windows produce different errors?

Another thing to consider is that when the fd is closed, a slightly different race may still be possible — when a new pipe has been opened and the kernel allocated exactly the same file descriptor that the old discarded pipe had but is still in the variable. It's not that bad if we just hit another pipe created by `wait()` but may be dangerous if that file descriptor is pointing at something else completely. We should probably introduce some locking mechanism to prevent this.

https://github.com/cherrypy/magicbus/commit/5d4cfe2 should help us see the future exceptions since they weren't logged properly before.

Refs I encountered when I was trying to explore how `write()` could cause `EINVAL`:
* https://github.com/torvalds/linux/commit/9883035ae7edef3ec62ad215611cb8e17d6a1a5d
* https://man7.org/linux/man-pages/man2/pipe.2.html
* https://stackoverflow.com/a/28020096/595220
* https://bugs.python.org/issue19612
* https://hg.python.org/cpython/rev/039ac3f01c4e
* https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer
* https://man7.org/linux/man-pages/man3/errno.3.html
* https://stackoverflow.com/a/6031694/595220
* https://man7.org/linux/man-pages/man2/write.2.html
* https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/write?view=msvc-160
* https://github.com/python/cpython/blob/148bc05/Modules/posixmodule.c#L10004-L10084
* https://docs.microsoft.com/en-us/windows/win32/api/namedpipeapi/nf-namedpipeapi-createpipe
* https://stackoverflow.com/questions/23688492/oserror-errno-22-invalid-argument-in-subprocess
* https://hg.python.org/cpython/file/62cf4e77f785/Lib/subprocess.py#l1199
* https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/pipe?view=msvc-160
* https://github.com/python/cpython/blob/148bc05/Modules/clinic/_winapi.c.h#L346-L361
* https://github.com/gevent/gevent/issues/148
* [`116229c` (#160)](https://github.com/gevent/gevent/pull/160/commits/116229cfbc887f08dacb6a0eeae08013e83a608b)
* https://bugs.python.org/issue15896

### Weird stuff

I was trying to reproduce this specific type of OSError on my Gentoo machine under IPython by using `os.pipe()` and trying to break the file handles but never succeeded to trick it into returning `EINVAL` (`OSError: [Errno 22] Invalid argument`). I haven't tried adding threads, though.

@aminusfu do you have any insight into this?

Related archeology:
* `os.pipe()` was first introduced by @aminusfu in #1
* That change wasn't inspected deeply during the review: https://github.com/cherrypy/magicbus/pull/1/files#r597829271
* The follow-up refactoring may also have a number of  https://github.com/cherrypy/magicbus/pull/4/files#r597832583

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`test_processbus.py::TestBusMethod::test_wait` flakily fails with `START_ERROR` because of `EINVAL` #16

Huh?

Pointers

Weird stuff

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

test_processbus.py::TestBusMethod::test_wait flakily fails with START_ERROR because of EINVAL #16

Description

Huh?

Pointers

Weird stuff

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`test_processbus.py::TestBusMethod::test_wait` flakily fails with `START_ERROR` because of `EINVAL` #16