-
Notifications
You must be signed in to change notification settings - Fork 24
Bugfix: Check raft state before adding new block to block queue #63
base: main
Are you sure you want to change the base?
Bugfix: Check raft state before adding new block to block queue #63
Conversation
ltseeley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message does not contain any information about the bug or how this change fixes it. It's helpful to reviewers (and for anyone looking back at the commit history) to see an explanation of why this change was made.
| } | ||
|
|
||
| pub fn on_block_new(&mut self, block: Block) { | ||
| let mut leader_state_publishing_this_block = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making this mutable and setting it later, you can combine some of this code:
let leader_state_publishing_this_block = match self.leader_state { ... };
if leader_state_publishing_this_block { ... }
if leader_state_publishing_this_block || self.follower_state.is_some() { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done. Do you see any problem in using mutable variable. Or do you mean the approach you suggested above is more idiomatic in rust.
This fix was done in order to handle a case where a follower becomes a leader (Node 1) and previous leader (Node 2) had sent a new block (say Block 1) just before changing to follower. Node 2 received Block 1 later. At same time Node 2 also has started building a block (Block 2) on top of previous head same as Block 1 previous head. So in this scenario Node 2 will have 2 blocks (Block 1 and Block 2) added to queue with same block number but different block_id , both pointing to same previous head. Leader node should accept only those blocks published by the leader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that a leader node publishes a new block before stepping down, and the new leader also publishes a block at the same height. Is that correct? If so, then I think the better way to solve this problem is to not publish (finalize) a block until after the node's state is updated. This should be pretty simple to accomplish by moving the call to node.check_publish out of the node.tick method and instead call it from engine.rs after the call to process_ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I think its good for leader node to accept only the blocks it publishes. So making this explicit check in code is better. Even if we rely on order of calling node.check_publish after process_ready, can the new block arrive other nodes at any time based on the network delays ? What if block was published and then leader changed to follower in next round and block took long time to reach another node which became leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that implementing checks for received blocks is a good idea. But I think there should probably be a broader fix for this issue. There are various situations where multiple blocks could be received at the same block height: network delays, bad-actors (e.g. some random follower publishes a block), etc. Also, part of the consensus engine's job is to decide on all blocks it receives, to either accept or reject them. So if the node gets a bad block, it needs to tell the validator to reject it. These things should all be considered as a complete solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the "API contracts" section of the consensus API RFC: https://github.com/hyperledger/sawtooth-rfcs/blob/master/text/0004-consensus-api.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we are not solving for all bad actor cases as Raft is not BFT though we can still consider cases where it can be handled with minimal checks. I've considered only case where block is received at same block height due to network delays. Do you mean we should also call fail_block() when leader receives block at same height not published by it (to avoid any kind of memory leaks in validator). And I guess same should be done at follower end when it receives updated log entry from leader. So if a follower commits a log entry, fail all blocks which is not yet committed and whose height is less than recent committed block ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the commit by calling fail_block() if the new block is not handled
| // or if Raft node is in follower state | ||
| if leader_state_publishing_this_block || self.follower_state.is_some() { | ||
| debug!("Block has been received: {:?}", &block); | ||
| self.block_queue.block_new(block.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think similar check is required before adding a new block to the commit_queue? "validator_backlog" is still ok to have extra blocks in it but commit_queue should not have extra blocks which are not to be committed. Example if leader has changed and the block arrives after validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean in on_block_valid() check if the block arrived in follower/leader state is same as the block for which 'check_blocks()' was called earlier in on_block_new() while it was in corresponding follower/leader state. So for example, if we called check_blocks() for block 1 in leader state and we change to follower state and then block 1 arrived later then ignore block 1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be handled in separate PR if required.
Add new block to block queue only if 1) Raft node is in leader publishing state and this is a block which leader published 2) or if raft node is a follower Signed-off-by: Manoj Gopalakrishnan <[email protected]>
5f8a2c7 to
f444192
Compare
Add new block to block queue only if
Signed-off-by: Manoj Gopalakrishnan [email protected]