Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message gets ignored if received before we sync with protocol state #382

Closed
Tracked by #326
itegulov opened this issue Nov 24, 2023 · 7 comments
Closed
Tracked by #326
Assignees
Labels
Emerging Tech Emerging Tech flying formation at Pagoda Near BOS NEAR BOS team at Pagoda

Comments

@itegulov
Copy link
Contributor

Description

I tried deploying multichain today and encountered an interesting bug. One of the nodes was slow to spin up and sync with contract state, so the other two managed to broadcast a message before this node was aware of them. Thus it "failed
to decrypt the message" and binned it crippling the entire key generation protocol as the result.

So I think we might want to do two things here:

  1. Use non-encrypted associated data to assign a message to a certain bucket
  2. Use a message queue that does not require knowing the contents of the message first, and assign it to a bucket later when we try to consume it and already own the state lock

Open to other suggestions as well

@itegulov itegulov added Near BOS NEAR BOS team at Pagoda Emerging Tech Emerging Tech flying formation at Pagoda labels Nov 24, 2023
@ChaoticTempest
Copy link
Member

I wonder if this is related to how we just send encrypted messages for all messages from the protocol including the cait_sith::Action::SendMany ones. It seems pretty probable that this is the case since a SendPrivate should only be broadcastable when nodes are aware of that particular participant

@ChaoticTempest
Copy link
Member

Additionally, we can restructure the locks out of the picture such that our game loop is the only holder of the protocol_state with the /state endpoint being the only access point. We can use a (lockless?) MPSC queue, where the each /msg endpoint sends it to the queue, and the game loop later processes them in order. This would put all the burden on that loop though which can be an intensive operation, but we already do send all our messages there after decrypting anyways. So this approach would just put all the decrypting process in the loop itself

@itegulov
Copy link
Contributor Author

itegulov commented Nov 28, 2023

It seems pretty probable that this is the case since a SendPrivate should only be broadcastable when nodes are aware of that particular participant

Well, so the thing is - participants are aware of each right from the start because the initialized contract state contains the information about them (including the URL). So imagine the following situation:

  1. Participant $A$ starts their node
  2. $A$ syncs with contract state (which has participant $B$)
  3. $A$ starts key generation protocol
  4. Participant $B$ starts their node
  5. $A$ sends a protocol message to $B$
  6. $B$ refuses the message because it doesn't have contract state yet
  7. $B$ sync with contract state (which has participant $A$)
  8. $B$ sends a message to $A$ and waits for its message which never arrives
  9. The protocol is stuck

@itegulov
Copy link
Contributor Author

Additionally, we can restructure the locks out of the picture such that our game loop is the only holder of the protocol_state with the /state endpoint being the only access point. We can use a (lockless?) MPSC queue, where the each /msg endpoint sends it to the queue, and the game loop later processes them in order. This would put all the burden on that loop though which can be an intensive operation, but we already do send all our messages there after decrypting anyways. So this approach would just put all the decrypting process in the loop itself

Agreed, this is a sensible approach. And we should ensure that the queue should only be processed after we have synced with the contract state.

@ChaoticTempest
Copy link
Member

Well, so the thing is - participants are aware of each right from the start because the initialized contract state contains the information about them (including the URL). So imagine the following situation:

  1. Participant A starts their node
    ...

Then additionally, we can have a separate queue to store the message on the sender participant, such that any network failure when sending the encrypted message becomes queued up to be sent later in the protocol loop. Then for simplicity sakes, we should also just by default send all messages during the protocol loop too.

@itegulov
Copy link
Contributor Author

Yeah, that would make sense. We also just need a general timeout/retry mechanism too.

@volovyks volovyks moved this from Backlog to Selected in Emerging Technologies Jan 23, 2024
@ChaoticTempest
Copy link
Member

This should have been completed with #395 along with #423

@github-project-automation github-project-automation bot moved this from Selected to Done in Emerging Technologies Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Emerging Tech Emerging Tech flying formation at Pagoda Near BOS NEAR BOS team at Pagoda
Projects
Status: Done
Development

No branches or pull requests

2 participants