Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.stop() sometimes hangs indefinitely #2625

Open
marcus-pousette opened this issue Jul 18, 2024 · 8 comments
Open

.stop() sometimes hangs indefinitely #2625

marcus-pousette opened this issue Jul 18, 2024 · 8 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@marcus-pousette
Copy link
Contributor

marcus-pousette commented Jul 18, 2024

  • Version:
    1.8.1

  • Subsystem:
    @libp2p/websocket , connection manager, components

Severity:

High

Description:

Steps to reproduce the error:

I have a test scenario with a few nodes that I connect to each other and send some data

Sometimes when terminating a node it endlessly waits for the Websocket transport to shutdown. This is because there are connections that does close (?) and in the it-ws library there are no calls to server.closeAllConnections() before close.

I have managed to make the shutdown work better if modify

private async _invokeStartableMethod (methodName: 'beforeStart' | 'start' | 'afterStart' | 'beforeStop' | 'stop' | 'afterStop'): Promise<void> {
to terminate the components in the reverse order in sequentially... so I suspect there is some kind of race condition going on or leak of some sort

Also no issues if I use Tcp transport instead.

This is not a good description on how to reproduce the issue, because it is a bit random when it occurs. But wanted to create this issue if others have the same problem and will update the description if I have a isolated a good test

@marcus-pousette marcus-pousette added the need/triage Needs initial labeling and prioritization label Jul 18, 2024
@marcus-pousette
Copy link
Contributor Author

marcus-pousette commented Jul 18, 2024

When the stop timeouts, I have one connection left with 0 streams,
remoteAddr: '/ip6/::ffff:7f00:1/tcp/56327/p2p/12D3KooWApN6mEB5666wTpqpzkgcKXC7kgMxmm7CAWRiD3WAg4Yv'
direction: 'inbound'
encryption: '/noise'
multiplexer: '/yamux/1.0.0'
transient: false

@marcus-pousette
Copy link
Contributor Author

marcus-pousette commented Jul 18, 2024

Closing the connectionManager component before all other components seems also to fix the issue

async _invokeStartableMethod(methodName) {
        const timeout = new Promise((resolve, reject) => {
            setTimeout(() => {
                reject('timeout')
            }, 10000);
        });
        const connectionManager = this.components["connectionManager"]
        if (isStartable(connectionManager)) {
            await connectionManager[methodName]?.();
        }
        let promise = Promise.all(Object.values(this.components)
            .filter(obj => isStartable(obj))
            .map(async (startable) => {
                await startable[methodName]?.();
            }));
        const race = Promise.race([promise, timeout]);
        try {
            await race;
        }
        catch (e) {
            throw e;
        }
        finally {
            clearTimeout(timeout);
        }
    }

@achingbrain
Copy link
Member

in the it-ws library there are no calls to server.closeAllConnections() before close.

I may be missing something but where is the closeAllConnections method defined? The it-ws module stores a reference to a http/https net.Server which has no such method.

no issues if I use Tcp transport

Looking at the code, the TCP listener aborts each connection on close whereas the WebSocket transport tries to close them gracefully.

Do you still see the issue if the transport aborts the connections instead of closing them on close?

@marcus-pousette
Copy link
Contributor Author

http.Server and https.Server has that method since NodeJs 18.2.0

/// it-ws/server.ts

 async close (): Promise<void> {
    await new Promise<void>((resolve, reject) => {
      this.server.closeAllConnections()
      this.server.close((err) => {
        if (err != null) {
          reject(err); return
        }

        resolve()
      })
    })
  }
  

@marcus-pousette
Copy link
Contributor Author

marcus-pousette commented Jul 18, 2024

Actually the closeAllConnections trick does work since the connection is a tcp one (?)

The close connection manager first trick seems to always work.

Also adding like a 5s delay before stopping also seems to work.

It feels like there is some kind of connection creation in process that is not caught correctly on closing

@achingbrain
Copy link
Member

http.Server and https.Server has that method since NodeJs 18.2.0

Aha, that's interesting, nice find.

The docs do say though:

This does not destroy sockets upgraded to a different protocol, such as WebSocket or HTTP/2

...so I wonder if there's something else going on here?

@marcus-pousette
Copy link
Contributor Author

Ye the whole thing about aborting Websocket connections before closing does not seem to have an impact. Only the two things I mentioned in my last comment did mitigate the problem for me

@marcus-pousette
Copy link
Contributor Author

Looking at the code, the TCP listener aborts each connection on close whereas the WebSocket transport tries to close them gracefully.

That would explain something!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

2 participants