-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node退出时在~NodeImpl 析构中出core #307
Comments
进程退出的时候触发的? |
是的。 |
在 main 函数里对 node 执行 shutdown & join 保证退出干净了 |
node的join似乎没有保证异步必须全部结束才退出。而且node本身的生命周期就是通过引用计数控制的。所以我不知道退出干净这个要怎么保证?现有的代码不支持吗?还是是需要修改现有的Node代码,在异步任务中加个引用计数跟踪? |
试试在main里等待node的join+brpc的join返回,brpc server的join结束的话,理论上所有rpc相关的都干净了 |
首先非常感谢您的回答。 |
感觉不应该,brpc 退出的时候会保证请求处理结束 https://github.com/apache/incubator-brpc/blob/master/docs/cn/server.md |
从coredump的所有线程堆栈看,没有看到main函数。所以这个是退出了吗? |
嗯,那就是退出了 |
有可能是braft的问题,OnPreVoteRPCDone 这个可能是客户端侧就报错了,那不在brpc server的控制范围内 |
join里需要增加一个等待node在飞rpc结束的逻辑 |
当时的controller是有超时错误的。内容如下: |
我应该明白了,就是brpc_server内的stop和join,只能管理自身作为server的rpc请求,如果自身是作为rpc的客户端,那么回调其实现在node里面,都需要添加相应的计数,然后在join里等待是吧? |
对,是的,braft 库本身的问题我们有空也修一下 |
堆栈
(gdb) bt
#0 0x00007f101947b4ab in raise () from /lib64/libpthread.so.0
#1 0x0000559813aa1cfa in handle_fatal_signal(int) ()
#2
#3 0x00007f1016bbf1f7 in raise () from /lib64/libc.so.6
#4 0x00007f1016bc08e8 in abort () from /lib64/libc.so.6
#5 0x00007f1017adc809 in butil::debug::BreakDebugger () at brpc/src/butil/debug/debugger_posix.cc:242
#6 0x00007f101cb5d547 in get_or_create_tls_agent (id=) at /usr/include/bvar/detail/agent_group.h:130
#7 get_or_create_tls_agent (this=0x7f1013471370 braft::g_num_nodes+16) at /usr/include/bvar/detail/combiner.h:295
#8 bvar::Reducer<long, bvar::detail::AddTo, bvar::detail::MinusFrom >::operator<< (this=0x7f1013471360 braft::g_num_nodes, value=) at /usr/include/bvar/reducer.h:193
#9 0x00007f1012e4b193 in braft::NodeImpl::~NodeImpl (this=0x5598d4647800, __in_chrg=) at node.cpp:518
#10 0x00007f1012e4b470 in braft::NodeImpl::~NodeImpl (this=0x5598d4647800, __in_chrg=) at node.cpp:519
#11 0x00007f1013ea40b5 in DeleteInternal (x=0x5598d4647800) at /usr/include/butil/memory/ref_counted.h:191
#12 Destruct (x=0x5598d4647800) at /usr/include/butil/memory/ref_counted.h:152
#13 butil::RefCountedThreadSafe<braft::NodeImpl, butil::DefaultRefCountedThreadSafeTraits < braft::NodeImpl> >::Release (this=0x5598d4647808) at /usr/include/butil/memory/ref_counted.h:180
#14 0x00007f1012e93644 in braft::OnPreVoteRPCDone::~OnPreVoteRPCDone (this=0x5598ad8bf800, __in_chrg=) at node.cpp:2917
#15 0x00007f1012e93746 in braft::OnPreVoteRPCDone::~OnPreVoteRPCDone (this=0x5598ad8bf800, __in_chrg=) at node.cpp:2918
#16 0x00007f1012e93a5c in braft::OnPreVoteRPCDone::Run (this=0x5598ad8bf800) at node.cpp:2931
#17 0x00007f1017bcb0f2 in brpc::Controller::EndRPC (this=0x5598ad8bf8a0, info=...) at brpc/src/brpc/controller.cpp:883
#18 0x00007f1017bcb2c0 in brpc::Controller::RunEndRPC (arg=) at brpc/controller.cpp:684
#19 0x00007f1017b65c75 in bthread::TaskGroup::task_runner (skip_remained=) at brpc/src/bthread/task_group.cpp:297
#20 0x00007f1017b4d061 in bthread_make_fcontext () from /lib64/libbrpc.so
(gdb) p g_num_nodes
$2 = {<bvar::Reducer<long, bvar::detail::AddTo, bvar::detail::MinusFrom >> = {bvar::Variable = {_vptr.Variable = 0x7f101815dc30 <vtable for bvar::Variable+16>, _name = ""}, _combiner = {
_id = -1, _op = {}, _lock = {butil::Mutex = {_native_handle = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {
__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}, }, _global_result = -20,
result_identity = 0, element_identity = 0, agents = {root = {previous = 0x7f10134713b8 braft::g_num_nodes+88, next = 0x7f10134713b8 braft::g_num_nodes+88}}}, _sampler = 0x0,
_series_sampler = 0x0, _inv_op = {}}, }
从gdb中的g_num_nodes打印中可以看到,g_num_nodes已经被析构了。但是该变量是静态变量,应该是等到程序退出的时候才析构。不知道这个是什么问题?不知道有没有什么思路呢?
@PFZheng
The text was updated successfully, but these errors were encountered: