Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor and Alert when routers not correctly filling transfers #4982

Open
2 tasks
preethamr opened this issue Oct 7, 2023 · 74 comments
Open
2 tasks

Monitor and Alert when routers not correctly filling transfers #4982

preethamr opened this issue Oct 7, 2023 · 74 comments
Assignees
Labels
Bug Issue type: Bug

Comments

@preethamr
Copy link
Collaborator

Problem

When a properly setup active router stops bidding, we don't know of it, until operator (we or router) checks.

Impact

Router operator miss on potential revenue, possibly transfers not getting boosted.

Proposed Solution

Monitor every active router, and alert when it stops sending successful bids, for more than a threshold amount of time

Acceptance Criteria

From an end user perspective, the below criteria must be met to consider this done:

  • Every active router is monitored
  • Alert sent to a Discord alert channel
@preethamr preethamr added the Enhance 🧘 Issue type: Enhancement Request label Oct 7, 2023
@alexwhte alexwhte added the core label Oct 23, 2023
@alexwhte alexwhte removed the core label Nov 15, 2023
@MinistroDolar
Copy link

@alexwhte alexwhte added Bug Issue type: Bug and removed Enhance 🧘 Issue type: Enhancement Request labels Jan 29, 2024
@alexwhte alexwhte changed the title Monitor and Alert when a activate router stops bidding Monitor and Alert when routers not correctly filling transfers Jan 29, 2024
@alexwhte
Copy link
Contributor

@preethamr
Copy link
Collaborator Author

RCA:
https://www.notion.so/Gnosis-Asman-Router-Issues-b9a864625a394f06870d206fbf8aa80e
Initial root cause for 3:

  • Routers poll loop around retryXcalls likely freeze on connection mq.publish
    await Promise.all(
    transfersToPublish.map(async (transfer) => {
    // new request context with the transfer id
    const { requestContext: _requestContext, methodContext: _methodContext } = createLoggingContext(
    "pollSubgraph",
    undefined,
    transfer!.transferId,
    );
    try {
    await mqClient.publish<OriginTransfer>(MQ_EXCHANGE, {
    body: transfer as OriginTransfer,
    type: XCALL_MESSAGE_TYPE,
    routingKey: XCALL_QUEUE,
    });
    await cache.transfers.setBidStatus(transfer?.transferId as string);
    logger.debug("Published transfer to mq", _requestContext, _methodContext, { transfer });
    } catch (err: unknown) {
    logger.error("Error publishing to mq", _requestContext, _methodContext, jsonifyError(err as Error));
    }
    }),
    );

Overall Cause:

Our routers don’t run rabbitMQ in a HA environment.

@Tommy-R8
Copy link
Collaborator

@wanglonghong
Copy link
Collaborator

wanglonghong commented Mar 25, 2024

The sequencer endpoint to monitor router status is live:

Request: 
https://sequencer.mainnet.connext.ninja/router-status/{ROUTER_ADDRESS}

Response:
{
"lastActiveTimestamp": 0,
"lastBidTimestamp": {...}
}

Need further discussion to figure out the best place where we integrate it.

@MinistroDolar
Copy link

WETH from Arbitrum to Linea

tx: https://connextscan.io/tx/0x7c6348731cbe30c490cef3f5513ab6e71d1dbeb00c460738a14ad40d07d706f3

Routers:
https://connextscan.io/router/0x5d527765252003acee6545416f6a9c8d15ae8402 (this router seems not bidding txns higher than 1 eth)

Ticket: Intercom 375

@MinistroDolar
Copy link

@0xanedi 0xanedi pinned this issue Mar 31, 2024
@alexwhte
Copy link
Contributor

alexwhte commented Apr 4, 2024

Were looking into this

@MinistroDolar
Copy link

ETH from Polygon to Metis
image

tx: https://connextscan.io/tx/0xd2f698802d30ff553c64cb3bdd2a296b83971f4c2815d81d4da07384d9ac9f5e

routers: https://connextscan.io/router/0x97b9dcb1aa34fe5f12b728d9166ae353d1e7f5c4 (this router is active and bidding txns going to Metis, I'm not sure of what's happening with the txn above)

discord: https://discord.com/channels/454734546869551114/1228041371500154933

@MinistroDolar
Copy link

USDT from BNB to Ethereum

This is a special case, the user sent a high amount (55.700 USDT) and there are enough router liquidity to complete it, but it is distributed among 3 diff (active) routers. Multirouter bidding seems not working properly in this case.

image

Tx: https://connextscan.io/tx/0xe0422ea721f45fd4b4f9e3e713a17ac4eb78c841c296577c248353e5a32c6452

Routers:
https://connextscan.io/router/0x97b9dcb1aa34fe5f12b728d9166ae353d1e7f5c4
https://connextscan.io/router/0xf26c772c0ff3a6036bddabdaba22cf65eca9f97c
https://connextscan.io/router/0xc4ae07f276768a3b74ae8c47bc108a2af0e40eba

Discord: https://discord.com/channels/454734546869551114/1228470019390832751

@oncall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Issue type: Bug
Projects
None yet
Development

No branches or pull requests

7 participants