Skip to content

Spurious disconnect loop when a channel is stuck #3695

Open
@yellowred

Description

@yellowred

We have a case when a local LDK node disconnects a remote peer (LND) on RAA timeout in order to restore the channel operation and send an alert to upstream. The issue is in our case the disconnect does not achieve the main goal of restoring the channel and continues disconnect, reconnect, re-establish cycle indefinitely.

The root cause was a failure in the remote signer for the local node that cause one channel to be stuck. The remote signer cam online almost immediately and continued to provide signatures to CS/RAA messages, but LDK was unable recover from the failed state of the stuck channel and did not request any new signatures. And because the node was disconnecting the balance was fluctuating causing other services down the stack to be unreasonably busy.

LDK logs (sorted to last first):

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Disconnecting peer 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa due to not making any progress on channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb

... a minute before

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Handling channel resumption for channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb with no RAA, no commitment update, 0 pending forwards, 0 pending update_add_htlcs, not broadcasting funding, without channel ready, without announcement, without tx_signatures

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Generating channel update for channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Attempting to generate channel update for channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Reconnected channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb with lost outbound RAA and lost remote commitment tx, but unable to send due to resend order, waiting on signer for commitment update

Error in the remote node (LND):

ChannelLink(c4918671944d25f41b8cc7d4181d6c7b6011dda819daecbfc81ac14a37235bbb:1): received warning message from peer: chan_id=a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869aaa, err=Disconnecting due to timeout awaiting response
ChannelPoint(c4918671944d25f41b8cc7d4181d6c7b6011dda819daecbfc81ac14a37235bbb:1): pending remote commitment: (*lnwallet.commitment)(0x400bb1e480)({

The channel re-establishment works manually, so maybe we should be more proactive in querying the remote signer, instead of just waiting on signer for commitment update.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions