Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in GA_login_user if network disconnected (multi-sig wallet) #175

Closed
pavel-kokolemin opened this issue Oct 3, 2022 · 15 comments
Closed

Comments

@pavel-kokolemin
Copy link

I got a deadlock in GA_login_user if network is disconnected in the middle of the login process (multi-sig wallet).
I called GA_login_user and some short time later I received a notification (in different thread) that the network connection is down:

{"event":"network","network":{"current_state":"disconnected","next_state":"connected","wait_ms":0}}

but GA_login_user never returned.

Here is the relevant stack trace:

GA_login_user  (in libsideswap_client.dylib) + 204  [0x11b540b08]
ga::sdk::auto_auth_handler::advance()  (in libsideswap_client.dylib) + 24  [0x11b51cadc]
ga::sdk::auto_auth_handler::step()  (in libsideswap_client.dylib) + 2608  [0x11b51d730]
ga::sdk::auth_handler_impl::operator()()  (in libsideswap_client.dylib) + 636  [0x11b51a2e8]
ga::sdk::login_user_call::call_impl()  (in libsideswap_client.dylib) + 3636  [0x11b54b264]
ga::sdk::ga_session::authenticate(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<ga::sdk::signer>)  (in libsideswap_client.dylib) + 3348  [0x11b58d4bc]
ga::sdk::ga_session::on_post_login(std::__1::unique_lock<std::__1::mutex>&, nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool)  (in libsideswap_client.dylib) + 8364  [0x11b583b64]
ga::sdk::ga_session::subscribe_all(std::__1::unique_lock<std::__1::mutex>&)  (in libsideswap_client.dylib) + 372  [0x11b58820c]
ga::sdk::wamp_transport::subscribe(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<void (nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, bool)  (in libsideswap_client.dylib) + 500  [0x11b817860]
boost::future<autobahn::wamp_subscription>::get()  (in libsideswap_client.dylib) + 92  [0x11b81839c]
boost::detail::shared_state<autobahn::wamp_subscription>::get(boost::unique_lock<boost::mutex>&)  (in libsideswap_client.dylib) + 84  [0x11b822bd4]
boost::condition_variable::wait(boost::unique_lock<boost::mutex>&)  (in libsideswap_client.dylib) + 76  [0x11ac76174]
_pthread_cond_wait  (in libsystem_pthread.dylib) + 1236  [0x1aee2c83c]
__psynch_cvwait  (in libsystem_kernel.dylib) + 8  [0x1aedf2270]

This happend on macOS arm64 GDK build (commit edacc118176d157597062272dc7bb6ea4ba01afc).

@jgriffiths
Copy link
Contributor

Hi @pavel-kokolemin is it possible for you to reproduce this? If so, could you pastebin the complete set of thread stacks?

This could be a locking issue in autobahn, I need to see where the other competing thread is at to diagnose.

in gdb, thread apply all where will give you all threads stacks.

@pavel-kokolemin
Copy link
Author

Hi @jgriffiths
This is not very easy to reproduce. I left our app running on a M1 macbook which has been automatically waking up from time to time (don't know why macOS works this way).
As a result I see that there were 542 login attempts (in about 3 days) where 539 attempts succeeded, 2 failed (this is expected because network connection died in the middle of the login process) and 1 connection attempt deadlocked (which is a bug I believe).
I was not able to connect with lldb at that time so I got stacks using Activity Monitor. I'm attaching it here and the log from our app (not sure if it would be useful).
threads.txt
app_log.log

@jgriffiths
Copy link
Contributor

Many thanks @pavel-kokolemin I'll see if I can debug from the call stacks.

@pavel-kokolemin
Copy link
Author

@jgriffiths I'm attaching the full call stacks when GDK hangs.
It happened like this. I got a notification that GDK reconnected:

[2024-02-06 16:02:33.729] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"network","network":{"current_state":"disconnected","next_state":"connected","wait_ms":0}}
[2024-02-06 16:02:34.248] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"network","network":{"current_state":"connected","next_state":"connected","wait_ms":0}}

So I called GA_login_user but it did not return.
GDK kept sending notifications in the meantime:

[2024-02-06 15:46:18.937] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"subaccount","subaccount":{"event_type":"synced","pointer":0}}
[2024-02-06 15:46:18.937] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"subaccount","subaccount":{"event_type":"synced","pointer":1}}
[2024-02-06 16:02:33.729] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"network","network":{"current_state":"disconnected","next_state":"connected","wait_ms":0}}
[2024-02-06 16:02:34.248] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"network","network":{"current_state":"connected","next_state":"connected","wait_ms":0}}
[2024-02-06 16:21:10.510] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"network","network":{"current_state":"disconnected","next_state":"connected","wait_ms":0}}
[2024-02-06 16:21:11.055] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600002067480): {"event":"network","network":{"current_state":"connected","next_state":"connected","wait_ms":0}}

I'm calling GDK from a single thread.
This happens on 0.69.0 (5868f47cb0cd9513485df55d0c47232c5b09103b commit). I have not tried 0.70.0 yet
Thanks.
ga_login_deadlock_call_stacks.txt

@jgriffiths
Copy link
Contributor

jgriffiths commented Feb 7, 2024

Hi @pavel-kokolemin

It seems that you are attempting to reconnect from the same thread that has received a disconnect notification. client::worker::Data::process_wallet_notif is calling libgreenaddress.0.dylib:GA_login_user in thread #18 in the stack trace.

From the gdk docs for GA_set_notification_handler:

 * The caller should not call session functions from within the callback
 * handler as this may block the application.

If you wish to reconnect automatically on disconnect, you should poke another (non-notification handling) thread from your notification handler to perform this operation.

@pavel-kokolemin
Copy link
Author

Hi @jgriffiths
I do not call GDK from the notification thread.
All notifications are queued and processed in the worker thread (the one that deadlocks).
There is nothing from GDK under process_wallet_notif:

frame #23: 0x00000001657deba4 libsideswap_client.dylib`sideswap_client::worker::Data::process_wallet_notif::h62ab5ff1e87cce40(self=0x000000016e6f89e8, account_id=1, notification=Notification @ 0x000000016e6fa290) at worker.rs:1176:34
frame #24: 0x0000000165800bb4 libsideswap_client.dylib`sideswap_client::worker::start_processing::hd1e74b72cfc8ca2e(env=Prod, msg_sender=(flavor = crossbeam_channel::channel::SenderFlavor<sideswap_client::worker::Message> @ 0x000000016e6f7fb8), msg_receiver=(flavor = crossbeam_channel::channel::ReceiverFlavor<sideswap_client::worker::Message> @ 0x000000016e6f7fc8), from_sender=(flavor = crossbeam_channel::channel::SenderFlavor<sideswap_client::ffi::proto::from::Msg> @ 0x000000016e6fa6a8), params=StartParams @ 0x000000016e6fa848) at worker.rs:4529:48
frame #25: 0x000000016568ec18 libsideswap_client.dylib`sideswap_client::ffi::sideswap_client_start_impl::_$u7b$$u7b$closure$u7d$$u7d$::h42f22f9e5c54b47d at ffi.rs:146:13
frame #26: 0x00000001656b903c libsideswap_client.dylib`std::sys_common::backtrace::__rust_begin_short_backtrace::h6cfb19c0e139a130(f=<unavailable>) at backtrace.rs:154:18
frame #27: 0x0000000165617e8c libsideswap_client.dylib`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h4cb68dc380699b2b at mod.rs:529:17
frame #28: 0x00000001656a24e0 libsideswap_client.dylib`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::ha9ee380e373d8c43(self=<unavailable>, (null)=<unavailable>) at unwind_safe.rs:272:9
frame #29: 0x000000016591333c libsideswap_client.dylib`std::panicking::try::do_call::h58ea986cb717b4c2(data="\U00000001") at panicking.rs:552:40
frame #30: 0x000000016591383c libsideswap_client.dylib`__rust_try + 32
frame #31: 0x00000001659131a8 libsideswap_client.dylib`std::panicking::try::hd4a1f33ea22d3258(f=<unavailable>) at panicking.rs:516:19
frame #32: 0x0000000165616e10 libsideswap_client.dylib`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::h07c26f88138059bd at panic.rs:142:14
frame #33: 0x0000000165616e0c libsideswap_client.dylib`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::h07c26f88138059bd at mod.rs:528:30
frame #34: 0x0000000165835130 libsideswap_client.dylib`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h5e498cce0762d021((null)=0x000060000364e6d0, (null)=<unavailable>) at function.rs:250:5
frame #35: 0x00000001664e904c libsideswap_client.dylib`std::sys::unix::thread::Thread::new::thread_start::h9b6324e2391e6ebb [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::he407efbeff8db698 at boxed.rs:2007:9 [opt]
frame #36: 0x00000001664e9040 libsideswap_client.dylib`std::sys::unix::thread::Thread::new::thread_start::h9b6324e2391e6ebb [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h15f2f0818ccedd2c at boxed.rs:2007:9 [opt]
frame #37: 0x00000001664e903c libsideswap_client.dylib`std::sys::unix::thread::Thread::new::thread_start::h9b6324e2391e6ebb at thread.rs:108:17 [opt]
frame #38: 0x0000000184a6e034 libsystem_pthread.dylib`_pthread_start + 136

@jgriffiths
Copy link
Contributor

Hi @pavel-kokolemin

Sorry for the delay. Agreed that in fact the notification handler is not calling back out from gdk so this should not be deadlocking. I will investigate this weekend to try to reproduce it and hope to get back to you soon.

@pavel-kokolemin
Copy link
Author

I tried one of the latest commits (9361b8d673c4e837194b8c097f50aab3c3c0e95d), but it also deadlocks in the same way:

  thread #18, name = 'worker_rust'
    frame #0: 0x00000001821e506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001822225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x000000028022714c libgreenaddress.dylib`boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) [inlined] boost::posix::pthread_cond_wait(c=0x000000011be84430, m=0x000000011be843f0) at pthread_helpers.hpp:112:17
    frame #3: 0x000000028022713c libgreenaddress.dylib`boost::condition_variable::wait(this=0x000000011be843f0, m=0x000000016f123508) at condition_variable.hpp:79:19
    frame #4: 0x0000000280226c54 libgreenaddress.dylib`void boost::condition_variable::wait<boost::_bi::bind_t<bool, boost::_mfi::mf0<bool, boost::detail::shared_state_base>, boost::_bi::list1<boost::reference_wrapper<boost::detail::shared_state_base>>>>(this=0x000000011be843f0, m=0x000000016f123508, pred=bind_t<bool, boost::_mfi::mf0<bool, boost::detail::shared_state_base>, boost::_bi::list1<boost::reference_wrapper<boost::detail::shared_state_base> > > @ 0x000000016f1233c8) at condition_variable_fwd.hpp:93:17
    frame #5: 0x0000000280226a74 libgreenaddress.dylib`boost::detail::shared_state_base::wait_internal(this=0x000000011be84380, lk=0x000000016f123508, rethrow=true) at future.hpp:395:23
    frame #6: 0x00000002805cc184 libgreenaddress.dylib`boost::detail::shared_state<autobahn::wamp_subscription>::get_storage(this=0x000000011be84380, lk=0x000000016f123508) at future.hpp:647:17
    frame #7: 0x00000002805cbee4 libgreenaddress.dylib`boost::detail::shared_state<autobahn::wamp_subscription>::get(this=0x000000011be84380, lk=0x000000016f123508) at future.hpp:652:37
    frame #8: 0x00000002805b1300 libgreenaddress.dylib`boost::future<autobahn::wamp_subscription>::get(this=0x000000016f123600) at future.hpp:1804:35
    frame #9: 0x00000002805b0bd4 libgreenaddress.dylib`ga::sdk::wamp_transport::subscribe(this=0x000000011be0fb80, topic="com.greenaddress.blocks", fn= Lambda in File ga_session.cpp at Line 1121, is_initial=false) at wamp_transport.cpp:688:24
    frame #10: 0x00000002801cf394 libgreenaddress.dylib`ga::sdk::ga_session::subscribe_all(this=0x000000011d00be18, locker=0x000000016f125858) at ga_session.cpp:1121:17
    frame #11: 0x00000002801c8aa4 libgreenaddress.dylib`ga::sdk::ga_session::on_post_login(this=0x000000011d00be18, locker=0x000000016f125858, login_data=0x000000016f125780, root_bip32_xpub="xpub661MyMwAqRbcF3T5kTDJM8Wy99jTXgTbCreGVHji6nQwnwF2oogQi3TkWxTH1CALSJnFSpNEdUzYcTAaNTh1GscfrfDETapjXLCrnZF7whQ", watch_only=false, is_relogin=true) at ga_session.cpp:628:9
    frame #12: 0x00000002801d39fc libgreenaddress.dylib`ga::sdk::ga_session::authenticate(this=0x000000011d00be18, sig_der_hex="304402201c9493cd29f3f59c0fd1ab2a5bf96c40e4dbcf7881ce58ac4ada8c8bc4d055e20220314b3066e023b94ee4dd99a6580988ccc4b68a2170baf037110957e1ab1358c4", path_hex="GA", root_bip32_xpub="xpub661MyMwAqRbcF3T5kTDJM8Wy99jTXgTbCreGVHji6nQwnwF2oogQi3TkWxTH1CALSJnFSpNEdUzYcTAaNTh1GscfrfDETapjXLCrnZF7whQ", signer=std::__1::shared_ptr<ga::sdk::signer>::element_type @ 0x00006000030354b8 strong=4 weak=1) at ga_session.cpp:1085:16
    frame #13: 0x000000028014e99c libgreenaddress.dylib`ga::sdk::login_user_call::call_impl(this=0x0000000113a69c60) at ga_auth_handlers.cpp:416:35
    frame #14: 0x00000002800a598c libgreenaddress.dylib`ga::sdk::auth_handler_impl::operator()(this=0x0000000113a69c60) at auth_handler.cpp:215:27
    frame #15: 0x00000002800a8bf8 libgreenaddress.dylib`ga::sdk::auto_auth_handler::step(this=0x00006000001bba80) at auth_handler.cpp:464:13
    frame #16: 0x00000002800a81fc libgreenaddress.dylib`ga::sdk::auto_auth_handler::advance(this=0x00006000001bba80) at auth_handler.cpp:401:16
    frame #17: 0x00000002801454ac libgreenaddress.dylib`(anonymous namespace)::make_call(call_impl=0x0000000113a69c60) at ffi_c.cpp:115:14
    frame #18: 0x0000000280145be0 libgreenaddress.dylib`GA_login_user::$_10::operator()(this=0x000000016f126e06, session=0x0000600002b02be0, hw_device=0x0000600000d131a0, details=0x0000600000d13b70, call=0x000000016f1270e8) const at ffi_c.cpp:293:1
    frame #19: 0x0000000280136f3c libgreenaddress.dylib`auto (anonymous namespace)::c_invoke<GA_login_user::$_10, GA_session*&, GA_json*&, GA_json*&, GA_auth_handler**&>(func="GA_login_user", f=0x000000016f126e06, args=0x000000016f126e20, args=0x000000016f126e18, args=0x000000016f126e10, args=0x000000016f126e08) at ffi_c.cpp:54:9
    frame #20: 0x0000000280136e88 libgreenaddress.dylib`GA_login_user(session=0x0000600002b02be0, hw_device=0x0000600000d131a0, details=0x0000600000d13b70, call=0x000000016f1270e8) at ffi_c.cpp:293:1
    frame #21: 0x000000016cae5c44 libsideswap_client.dylib`sideswap_client::gdk_ses_impl::login::h214d244a27559492(data=0x0000000113104a90) at gdk_ses_impl.rs:2115:14
    frame #22: 0x000000016cad6720 libsideswap_client.dylib`_$LT$sideswap_client..gdk_ses_impl..GdkSesImpl$u20$as$u20$sideswap_client..gdk_ses..GdkSes$GT$::login::h0b1be7ffc2376e77(self=0x0000000113104a90) at gdk_ses_impl.rs:50:18
    frame #23: 0x000000016ca80318 libsideswap_client.dylib`sideswap_client::worker::Data::process_wallet_notif::h1913aea8151caee5(self=0x000000016f1289e8, account_id=1, notification=Notification @ 0x000000016f12a290) at worker.rs:1176:34

Maybe using boost::future::wait_for instead of boost::future::get can solve the problem?

login_deadlock.txt

@jgriffiths
Copy link
Contributor

Hi @pavel-kokolemin,

Its possible that the network is dying while the subscription is being made, and that is not handled well under macs. It should be possible to wait and loop for subscriptions to complete in this case, checking the underlying connection as we do for standard RPC calls. I'll create a branch for testing and ping here when its available.

@jgriffiths
Copy link
Contributor

@pavel-kokolemin please try https://github.com/Blockstream/gdk/tree/wait_subscribe - its not production ready but it should tell us if the approach fixes your issue.

@pavel-kokolemin
Copy link
Author

No deadlocks so far.
GA_login_user failed a few times with unique_lock::unlock: not locked: Operation not permitted. Each time it took about 43 seconds to return. I guess that's to be expected since the connection was down a moment later:

[2024-02-28 02:42:53.283] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600001d7ab20): {"event":"network","network":{"current_state":"connected","next_state":"connected","wait_ms":0}}
[2024-02-28 02:42:53.284] DEBUG sideswap_client::worker request wallet 1 login
[2024-02-28 02:42:53.735] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600001d7ab20): {"event":"subaccount","subaccount":{"event_type":"synced","pointer":0}}
[2024-02-28 02:42:53.735] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600001d7ab20): {"event":"subaccount","subaccount":{"event_type":"synced","pointer":1}}
[2024-02-28 02:45:00.863] DEBUG sideswap_client::gdk_ses_impl new notification from 0 (session 0x600001d54000): {"block":{"block_hash":"6204cb76bbaa4d2224a2ddb808f58517bfa8374b6f2e8c1ee0052fbe3134f760","block_height":2746253,"previous_hash":"756e83fb8bbab5f3fa5b64dc471fd5db665e059ec097fcefbc2daf45aa30aabe"},"event":"block"}
[2024-02-28 02:45:37.852] DEBUG sideswap_client::gdk_ses_impl new notification from 0 (session 0x600001d54000): {"block":{"block_hash":"f67e0b720881d0a2546d62ae792df0388f8bc86a762fd6b3276a7422f56d7a51","block_height":2746254,"previous_hash":"6204cb76bbaa4d2224a2ddb808f58517bfa8374b6f2e8c1ee0052fbe3134f760"},"event":"block"}
[2024-02-28 02:45:40.029] DEBUG sideswap_client::gdk_ses_impl new notification from 1 (session 0x600001d7ab20): {"event":"network","network":{"current_state":"disconnected","next_state":"connected","wait_ms":0}}
[2024-02-28 02:45:40.740] DEBUG sideswap_client::gdk_ses_impl auth handler status: {"action":"login_user","error":"unique_lock::unlock: not locked: Operation not permitted","name":"login_user","status":"error"}
[2024-02-28 02:45:40.740] INFO sideswap_client::worker show message: connection failed: login failed: unique_lock::unlock: not locked: Operation not permitted
[2024-02-28 02:45:40.740] DEBUG sideswap_client::worker to ui: {"ShowMessage":{"text":"connection failed: login failed: unique_lock::unlock: not locked: Operation not permitted"}}
[2024-02-28 02:45:40.740] WARN sideswap_client::worker processing time: 42.943668167 seconds

And when I opened the Macbook, everything worked fine.
I will test more.
Thanks for your help!

@jgriffiths
Copy link
Contributor

@pavel-kokolemin great news, thanks. The unique_lock::unlock: not locked: Operation not permitted error will be fixed when this code is reorganized for merging.

We are about to release 0.70.1 later today; I will create a 0.70.2 pre-release with the cleaned up fix for you to test further, and when you are happy with it will release 0.70.2 officially, probably early next week.

@jgriffiths
Copy link
Contributor

Hi @pavel-kokolemin can you please test with https://github.com/Blockstream/gdk/tree/wait_subscribe_reworked? This should fix the unlock error you were previously seeing.

@pavel-kokolemin
Copy link
Author

Hi @pavel-kokolemin can you please test with https://github.com/Blockstream/gdk/tree/wait_subscribe_reworked? This should fix the unlock error you were previously seeing.

I started the app from the new branch 3 days ago and everything is working as expected so far, no more unlock errors and no deadlocks. Thank you very much!

@jgriffiths
Copy link
Contributor

Fantastic stuff, thanks @pavel-kokolemin

We will release this as part of gdk release 0.70.2 very shortly. I'll close this issue, please reopen or create another if you encounter any further problems, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants