-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogManager::disk_thread run closures in bthread after the log manager shutdown, to avoid deadlock #242
base: master
Are you sure you want to change the base?
Conversation
… shutdown, to avoid deadlock. In the situation, the closure may hold the last reference of NodeImpl, and done->Run() triggers the the destructor of NodeImpl. In the destructor, NodeImpl joins the disk thread to terminate, but the done->Run() itself is excuted in the disk thread. The disk thread can never be terminated!
能构造ut去复现么? |
可以的,我加一下 |
#0 0x00007fc4614a488b in jump_stack (to=, from=) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/stack_inl.h:132 |
这个bug有可能在example的例子里面复现么 |
这个pr还不能合入? |
@@ -816,6 +840,7 @@ void LogManager::set_applied_id(const LogId& applied_id) { | |||
void LogManager::shutdown() { | |||
std::unique_lock<raft_mutex_t> lck(_mutex); | |||
_stopped = true; | |||
CHECK_EQ(0, bthread::execution_queue_execute(_disk_queue, new ShutdownClosure)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的fix是不是还是有问题:
如果在 shutdown
之前有日志添加进来,在 shudown
之后没有新的日志添加进来。
那么 shutdown 之前的日志(最后一条),依然有可能在 disk_thread 里面执行,导致 自己 join 自己
@@ -600,6 +619,11 @@ int LogManager::disk_thread(void* meta, | |||
ret = log_manager->_log_storage->reset(rc->next_log_index()); | |||
break; | |||
} | |||
ShutdownClosure* sc = dynamic_cast<ShutdownClosure*>(done); | |||
if (sc) { | |||
ret = log_manager->_shutdown = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该不需要 ret 返回值
In the situation, the closure may hold the last reference of NodeImpl, and done->Run() triggers the
the destructor of NodeImpl. In the destructor, NodeImpl joins the disk thread to terminate, but
the done->Run() itself is excuted in the disk thread. The disk thread can never be terminated!