Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate erlang:disconnect_node vs erlang:halt vagaries... #1

Open
darach opened this issue Jan 19, 2015 · 1 comment
Open

Investigate erlang:disconnect_node vs erlang:halt vagaries... #1

darach opened this issue Jan 19, 2015 · 1 comment

Comments

@darach
Copy link
Owner

darach commented Jan 19, 2015

Testing with various combinations of causing and healing netsplits via erlang:disconnect_node/1
can produce divergent states or a dq (locks_leader) process dying. These issues do not occur at all
when a different testing strategy (erlang:halt) is used.

erlang:halt is used in the locks tests.
erlang:disconnect_node is used in riak_test.

Is erlang:disconnect_node/1 reasonable or are there underlying issues with dq, or deeper
still with locks or net_kernel ...

Credit to @Licenser for discovering one of the symptoms.

@darach
Copy link
Owner Author

darach commented Jan 19, 2015

A side effect of calling disconnect_node on the local node is a client_DOWN event
in the Q locks_server locks_agent process which in turn will induce failure in the local
Q process.

Hence, the use of disconnect_node to verify behaviour in a two node setup doesn't
work. We've (@Licenser and I) played with 2..more nodes with and without debugging
on VMs and in the local environment...

It would be nice if this worked 'as expected' with disconnect_node and this is worth investigating
some more... This would allow testing in the style of riak_test...

References:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant