Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate and fix wrong S-Chain discovery status problem #1596

Closed
sergiy-skalelabs opened this issue Sep 19, 2023 · 1 comment · Fixed by #1597, #1598, #1614 or #1615
Closed

Investigate and fix wrong S-Chain discovery status problem #1596

sergiy-skalelabs opened this issue Sep 19, 2023 · 1 comment · Fixed by #1597, #1598, #1614 or #1615
Assignees

Comments

@sergiy-skalelabs
Copy link
Contributor

We detected logically wrong S-Chain discovery situation. First we saw log message about successful 16 out of 16 S-Chain nodes completely discovered:

2023-09-18 15:26:06.072: S-Chain network discovery: Have S-Chain description response about 16 of 16 node(s).
2023-09-18 15:26:06.072: S-Chain network discovery: This S-Chain discovery will finish with 16 of 16 node(s) discovered.

But later we saw information about at least one S-Chain node was discovered partially or not discovered at all:

    2023-09-19 13:11:12.116: CRITICAL ERROR: BLS 1/16 public key discovery failed for node #10, node data is: {"httpRpcPort":10131,"httpRpcPort6":0,"httpsRpcPort":10136,"httpsRpcPort6":0,"ip":"34.217.246.35","ip6":"","nodeID":35,"schainIndex":11,"wsRpcPort":10130,"wsRpcPort6":0,"wssRpcPort":10135,"wssRpcPort6":0,"pwaState":{"oracle":{"isInProgress":false,"ts":0},"m2s":{"isInProgress":false,"ts":0},"s2m":{"isInProgress":false,"ts":0},"s2s":{"mapS2S":{"0":{"isInProgress":false,"ts":0}}}}}
    2023-09-19 13:11:12.116: RAW/BLS/#10: CRITICAL ERROR: BLS node #10 verify error: error description is: BLS 1/16 public key discovery failed for node #10, stack is: 
Error: BLS 1/16 public key discovery failed for node #10
    --> discoverPublicKeyByIndex (/ima/agent/bls.mjs:166:15)
    --> Module.doVerifyReadyHash (/ima/agent/bls.mjs:2503:29)
    --> Module.handleLoopStateArrived (/ima/agent/pwa.mjs:229:26)
    --> ObserverServer.self.mapApiHandlers.skale_imaNotifyLoopWork (/ima/agent/loopWorker.mjs:210:21)
    --> InWorkerServerPipe._onPipeMessage (/ima/npms/skale-cool-socket/socketServer.mjs:90:73)
    --> InWorkerServerPipe.dispatchEvent (/ima/npms/skale-cool-socket/eventDispatcher.mjs:105:22)
    --> InWorkerServerPipe.implReceive (/ima/npms/skale-cool-socket/socket.mjs:287:14)
    --> InWorkerServerPipe.receive (/ima/npms/skale-cool-socket/socket.mjs:324:14)
    --> InWorkerSocketServerAcceptor.receiveForClientPort (/ima/npms/skale-cool-socket/socket.mjs:581:14)
    --> Object.onMessage (/ima/npms/skale-cool-socket/socket.mjs:444:29)
    2023-09-19 13:11:12.116: RAW/BLS/#10: CRITICAL ERROR: BLS node #10 verify output is:

These 2 log messages are completely incompatible with each other and demonstrating situation which must not happen in real life.
So, S-Chain discovery results may be saved or treated incorrect as successful. This means S-chain discovery code must perform stronger validation of S-Chain node description JSONs came from skale_imaInfo calls to skaled and also ensure awaiting for S-Chain discovery compete is not done until it's really done.

@sergiy-skalelabs sergiy-skalelabs self-assigned this Sep 19, 2023
@PolinaKiporenko PolinaKiporenko moved this to In Progress in SKALE Engineering 🚀 Sep 19, 2023
@sergiy-skalelabs sergiy-skalelabs moved this from In Progress to Code Review in SKALE Engineering 🚀 Sep 19, 2023
@github-project-automation github-project-automation bot moved this from Code Review to Ready For Release Candidate in SKALE Engineering 🚀 Sep 19, 2023
@sergiy-skalelabs sergiy-skalelabs moved this from Ready For Release Candidate to QA in SKALE Engineering 🚀 Sep 19, 2023
@sergiy-skalelabs sergiy-skalelabs linked a pull request Sep 19, 2023 that will close this issue
@EvgeniyZZ
Copy link

Can't reproduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment