-
Notifications
You must be signed in to change notification settings - Fork 24
Bugfix: Check raft state before adding new block to block queue #63
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,6 +99,7 @@ impl<S: StorageExt> SawtoothRaftNode<S> { | |
| } | ||
|
|
||
| pub fn on_block_new(&mut self, block: Block) { | ||
| let mut leader_state_publishing_this_block = false; | ||
| if match self.leader_state { | ||
| Some(LeaderState::Publishing(ref block_id)) => { | ||
| block_id == &block.block_id | ||
|
|
@@ -107,12 +108,19 @@ impl<S: StorageExt> SawtoothRaftNode<S> { | |
| } { | ||
| debug!("Leader({:?}) transition to Validating block {:?}", self.peer_id, block.block_id); | ||
| self.leader_state = Some(LeaderState::Validating(block.block_id.clone())); | ||
| leader_state_publishing_this_block = true; | ||
| } | ||
|
|
||
| debug!("Block has been received: {:?}", &block); | ||
| self.block_queue.block_new(block.clone()); | ||
|
|
||
| self.service.check_blocks(vec![block.block_id]).expect("Failed to send check blocks"); | ||
| // Add this block to block queue if it's a block which leader has published | ||
| // or if Raft node is in follower state | ||
| if leader_state_publishing_this_block || self.follower_state.is_some() { | ||
| debug!("Block has been received: {:?}", &block); | ||
| self.block_queue.block_new(block.clone()); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think similar check is required before adding a new block to the commit_queue? "validator_backlog" is still ok to have extra blocks in it but commit_queue should not have extra blocks which are not to be committed. Example if leader has changed and the block arrives after validation.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be handled in separate PR if required. |
||
| self.service.check_blocks(vec![block.block_id]).expect("Failed to send check blocks"); | ||
| } else { | ||
| // Fail the block if either leader or follower doesn't handle it | ||
| self.service.fail_block(block.block_id).expect("Fail block unsucessful"); | ||
| } | ||
| } | ||
|
|
||
| pub fn on_block_valid(&mut self, block_id: BlockId) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making this mutable and setting it later, you can combine some of this code:
let leader_state_publishing_this_block = match self.leader_state { ... };if leader_state_publishing_this_block { ... }if leader_state_publishing_this_block || self.follower_state.is_some() { ... }Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done. Do you see any problem in using mutable variable. Or do you mean the approach you suggested above is more idiomatic in rust.
This fix was done in order to handle a case where a follower becomes a leader (Node 1) and previous leader (Node 2) had sent a new block (say Block 1) just before changing to follower. Node 2 received Block 1 later. At same time Node 2 also has started building a block (Block 2) on top of previous head same as Block 1 previous head. So in this scenario Node 2 will have 2 blocks (Block 1 and Block 2) added to queue with same block number but different block_id , both pointing to same previous head. Leader node should accept only those blocks published by the leader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that a leader node publishes a new block before stepping down, and the new leader also publishes a block at the same height. Is that correct? If so, then I think the better way to solve this problem is to not publish (finalize) a block until after the node's state is updated. This should be pretty simple to accomplish by moving the call to
node.check_publishout of thenode.tickmethod and instead call it fromengine.rsafter the call toprocess_ready.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I think its good for leader node to accept only the blocks it publishes. So making this explicit check in code is better. Even if we rely on order of calling
node.check_publishafterprocess_ready, can the new block arrive other nodes at any time based on the network delays ? What if block was published and then leader changed to follower in next round and block took long time to reach another node which became leader.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that implementing checks for received blocks is a good idea. But I think there should probably be a broader fix for this issue. There are various situations where multiple blocks could be received at the same block height: network delays, bad-actors (e.g. some random follower publishes a block), etc. Also, part of the consensus engine's job is to decide on all blocks it receives, to either accept or reject them. So if the node gets a bad block, it needs to tell the validator to reject it. These things should all be considered as a complete solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the "API contracts" section of the consensus API RFC: https://github.com/hyperledger/sawtooth-rfcs/blob/master/text/0004-consensus-api.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we are not solving for all bad actor cases as Raft is not BFT though we can still consider cases where it can be handled with minimal checks. I've considered only case where block is received at same block height due to network delays. Do you mean we should also call fail_block() when leader receives block at same height not published by it (to avoid any kind of memory leaks in validator). And I guess same should be done at follower end when it receives updated log entry from leader. So if a follower commits a log entry, fail all blocks which is not yet committed and whose height is less than recent committed block ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the commit by calling fail_block() if the new block is not handled