Skip to content

Conversation

@goran-ethernal
Copy link
Collaborator

@goran-ethernal goran-ethernal commented Apr 5, 2022

Replay messages feature now saves actions that occurred in fuzz run to a separate .flow file so that they can be simulated on replay as well. For now, only Drop Node actions are saved to .flow file and simulated in replay.

When running fuzz-run command, messages are saved to messages.flow file, but other meta data (node names, actions, last sequences) are saved to metaData.flow file inside the SavedState folder.

metaData.flow file contains data about:

  • node names - name of nodes that were created on cluster.
  • last sequences in which node were when cluster was stopped - this is needed for replay to know exactly when to stop node from executing in replay (this way we now, that the node reached its end and it needs to be stopped).
  • actions - data about drop node actions that occurred in fuzz run, as well as their revert actions.

Look of messages.flow file remains the same. metaData.flow file, looks as follows:

["NODE_0","NODE_1","NODE_2","NODE_3","NODE_4"]
{"actionType":"LastSequence","data":"NODE_0","sequence":10,"round":0}
{"actionType":"LastSequence","data":"NODE_1","sequence":11,"round":0}
{"actionType":"LastSequence","data":"NODE_2","sequence":11,"round":0}
{"actionType":"LastSequence","data":"NODE_3","sequence":11,"round":0}
{"actionType":"LastSequence","data":"NODE_4","sequence":11,"round":0}
{"actionType":"DropNode","data":"NODE_3","sequence":5,"round":0}
{"actionType":"RevertDropNode","data":"NODE_3","sequence":6,"round":0}

MetaData struct is used to store all necessary data to metaData.flow file. On loading the given file, based on actionType property, replay knows in which map to store a given action.

replay-messages command is now changed to receive folder path where messages.flow and to metaData.flow files are stored, e.g:
go run ./e2e/fuzz/cmd/main.go replay-messages -filesDirectory=../SavedData

NOTE: both files are needed for replay to execute.

Simulating node actions and stopping nodes when they are done with execution is now handled in replay_node_execution.go.
Once replay is started, when node reads its next message from queue, it can now recognize if it needs to shut down properly or if it needs to simulate a drop.
We know that a node needs to be stopped when it has nothing to read from the queue and it reached the sequence and round that is defined in its LastSequence in metaData.flow file.
We know that a node needs to be dropped if it has nothing to read from the queue for given sequence and round, but given sequence and round are not the same as the one defined in its LastSequence in metaData.flow file.

replayNodeExecutionHandler starts a go routine before starting the cluster that listens if a node needs to be dropped, or if its done with execution, or if it needs to be restarted when a sequence is reached in cluster that corresponds to sequence in which node drop was reverted as defined in RevertNode action in metaData.flow file.

Once all nodes are done with execution, replayNodeExecutionHandler will stop the cluster, and command will exit.

@goran-ethernal goran-ethernal marked this pull request as ready for review April 5, 2022 09:04
@goran-ethernal goran-ethernal force-pushed the fuzz/replay-stop-execution branch from 02c1cc2 to 63486e9 Compare April 5, 2022 10:02
Copy link
Contributor

@ferranbt ferranbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of changes are required to support node drop. Do you guys have any insights on whether we can optimize any part?

@goran-ethernal
Copy link
Collaborator Author

To answer your question, about optimizing. 80% of these changes are mostly just reorganizing code to different files and structures since it was all clustered up in the ReplayMessageNotifier, so we tried to separate logic for reading, writing and node execution to different files.

@goran-ethernal goran-ethernal requested a review from ferranbt April 6, 2022 07:13
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 13 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants