Skip to content
This repository was archived by the owner on Jan 29, 2022. It is now read-only.
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions src/node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ impl<S: StorageExt> SawtoothRaftNode<S> {
}

pub fn on_block_new(&mut self, block: Block) {
let mut leader_state_publishing_this_block = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making this mutable and setting it later, you can combine some of this code:
let leader_state_publishing_this_block = match self.leader_state { ... };
if leader_state_publishing_this_block { ... }
if leader_state_publishing_this_block || self.follower_state.is_some() { ... }

Copy link
Contributor Author

@manojgop manojgop Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done. Do you see any problem in using mutable variable. Or do you mean the approach you suggested above is more idiomatic in rust.
This fix was done in order to handle a case where a follower becomes a leader (Node 1) and previous leader (Node 2) had sent a new block (say Block 1) just before changing to follower. Node 2 received Block 1 later. At same time Node 2 also has started building a block (Block 2) on top of previous head same as Block 1 previous head. So in this scenario Node 2 will have 2 blocks (Block 1 and Block 2) added to queue with same block number but different block_id , both pointing to same previous head. Leader node should accept only those blocks published by the leader

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that a leader node publishes a new block before stepping down, and the new leader also publishes a block at the same height. Is that correct? If so, then I think the better way to solve this problem is to not publish (finalize) a block until after the node's state is updated. This should be pretty simple to accomplish by moving the call to node.check_publish out of the node.tick method and instead call it from engine.rs after the call to process_ready.

Copy link
Contributor Author

@manojgop manojgop Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I think its good for leader node to accept only the blocks it publishes. So making this explicit check in code is better. Even if we rely on order of calling node.check_publish after process_ready, can the new block arrive other nodes at any time based on the network delays ? What if block was published and then leader changed to follower in next round and block took long time to reach another node which became leader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that implementing checks for received blocks is a good idea. But I think there should probably be a broader fix for this issue. There are various situations where multiple blocks could be received at the same block height: network delays, bad-actors (e.g. some random follower publishes a block), etc. Also, part of the consensus engine's job is to decide on all blocks it receives, to either accept or reject them. So if the node gets a bad block, it needs to tell the validator to reject it. These things should all be considered as a complete solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the "API contracts" section of the consensus API RFC: https://github.com/hyperledger/sawtooth-rfcs/blob/master/text/0004-consensus-api.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we are not solving for all bad actor cases as Raft is not BFT though we can still consider cases where it can be handled with minimal checks. I've considered only case where block is received at same block height due to network delays. Do you mean we should also call fail_block() when leader receives block at same height not published by it (to avoid any kind of memory leaks in validator). And I guess same should be done at follower end when it receives updated log entry from leader. So if a follower commits a log entry, fail all blocks which is not yet committed and whose height is less than recent committed block ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the commit by calling fail_block() if the new block is not handled

if match self.leader_state {
Some(LeaderState::Publishing(ref block_id)) => {
block_id == &block.block_id
Expand All @@ -107,12 +108,19 @@ impl<S: StorageExt> SawtoothRaftNode<S> {
} {
debug!("Leader({:?}) transition to Validating block {:?}", self.peer_id, block.block_id);
self.leader_state = Some(LeaderState::Validating(block.block_id.clone()));
leader_state_publishing_this_block = true;
}

debug!("Block has been received: {:?}", &block);
self.block_queue.block_new(block.clone());

self.service.check_blocks(vec![block.block_id]).expect("Failed to send check blocks");
// Add this block to block queue if it's a block which leader has published
// or if Raft node is in follower state
if leader_state_publishing_this_block || self.follower_state.is_some() {
debug!("Block has been received: {:?}", &block);
self.block_queue.block_new(block.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think similar check is required before adding a new block to the commit_queue? "validator_backlog" is still ok to have extra blocks in it but commit_queue should not have extra blocks which are not to be committed. Example if leader has changed and the block arrives after validation.

Copy link
Contributor Author

@manojgop manojgop Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean in on_block_valid() check if the block arrived in follower/leader state is same as the block for which 'check_blocks()' was called earlier in on_block_new() while it was in corresponding follower/leader state. So for example, if we called check_blocks() for block 1 in leader state and we change to follower state and then block 1 arrived later then ignore block 1 ?

Copy link
Contributor Author

@manojgop manojgop Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be handled in separate PR if required.

self.service.check_blocks(vec![block.block_id]).expect("Failed to send check blocks");
} else {
// Fail the block if either leader or follower doesn't handle it
self.service.fail_block(block.block_id).expect("Fail block unsucessful");
}
}

pub fn on_block_valid(&mut self, block_id: BlockId) {
Expand Down