Investigate erlang:disconnect_node vs erlang:halt vagaries... #1

darach · 2015-01-19T19:08:22Z

Testing with various combinations of causing and healing netsplits via erlang:disconnect_node/1
can produce divergent states or a dq (locks_leader) process dying. These issues do not occur at all
when a different testing strategy (erlang:halt) is used.

erlang:halt is used in the locks tests.
erlang:disconnect_node is used in riak_test.

Is erlang:disconnect_node/1 reasonable or are there underlying issues with dq, or deeper
still with locks or net_kernel ...

Credit to @Licenser for discovering one of the symptoms.

darach · 2015-01-19T21:06:32Z

A side effect of calling disconnect_node on the local node is a client_DOWN event
in the Q locks_server locks_agent process which in turn will induce failure in the local
Q process.

Hence, the use of disconnect_node to verify behaviour in a two node setup doesn't
work. We've (@Licenser and I) played with 2..more nodes with and without debugging
on VMs and in the local environment...

It would be nice if this worked 'as expected' with disconnect_node and this is worth investigating
some more... This would allow testing in the style of riak_test...

References:

riak_test partition: https://github.com/basho/riak_test/blob/master/src/rt.erl#L471-L480
riak_test heal: https://github.com/basho/riak_test/blob/master/src/rt.erl#L482-L490
locks tests: https://github.com/uwiger/locks/blob/master/test/locks_tests.erl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate erlang:disconnect_node vs erlang:halt vagaries... #1

Investigate erlang:disconnect_node vs erlang:halt vagaries... #1

darach commented Jan 19, 2015

darach commented Jan 19, 2015

Investigate erlang:disconnect_node vs erlang:halt vagaries... #1

Investigate erlang:disconnect_node vs erlang:halt vagaries... #1

Comments

darach commented Jan 19, 2015

darach commented Jan 19, 2015