Skip to content

Running the allocator_tutorial in intraprocess mode causes a segmentation fault #501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
clalancette opened this issue Apr 16, 2021 · 8 comments

Comments

@clalancette
Copy link
Contributor

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04 arm64
  • Installation type:
    • Debian packages
  • Version or commit hash:
    • ros-rolling-demo-nodes-cpp 0.14.0
  • DDS implementation:
    • CycloneDDS or Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

/opt/ros/rolling/lib/demo_nodes_cpp/allocator_tutorial intra

Expected behavior

Command runs until user stops it with Ctrl-C

Actual behavior

Command segfaults after a second or two

Additional information

Running the demo in regular (non-intraprocess) mode works fine.

@clalancette
Copy link
Contributor Author

(gdb) bt
#0  0x000055555559599d in rclcpp::experimental::SubscriptionIntraProcess<std_msgs::msg::UInt32_<std::allocator<void> >, std::allocator<void>, std::default_delete<std_msgs::msg::UInt32_<std::allocator<void> > >, std_msgs::msg::UInt32_<std::allocator<void> > >::provide_intra_process_message (this=0x5555556e4440, 
    message=std::shared_ptr<const std_msgs::msg::UInt32_<std::allocator<void> >> (empty) = {...})
    at /home/ubuntu/ros2_ws/install/rclcpp/include/rclcpp/experimental/subscription_intra_process.hpp:150
#1  0x0000555555592343 in rclcpp::experimental::IntraProcessManager::add_owned_msg_to_buffers<std_msgs::msg::UInt32_<std::allocator<void> >, MyAllocator<void>, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::allocator<void> > > > > (this=0x5555556ccaf0, 
    message=std::unique_ptr<std_msgs::msg::UInt32_<std::allocator<void> >> = {...}, subscription_ids=std::vector of length 1, capacity 1 = {...}, 
    allocator=std::shared_ptr<MyAllocator<std_msgs::msg::UInt32_<std::allocator<void> > >> (use count 3, weak count 0) = {...})
    at /home/ubuntu/ros2_ws/install/rclcpp/include/rclcpp/experimental/intra_process_manager.hpp:403
#2  0x000055555558d2c9 in rclcpp::experimental::IntraProcessManager::do_intra_process_publish<std_msgs::msg::UInt32_<std::allocator<void> >, MyAllocator<void>, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::allocator<void> > > > > (this=0x5555556ccaf0, intra_process_publisher_id=3, 
    message=std::unique_ptr<std_msgs::msg::UInt32_<std::allocator<void> >> = {...}, 
    allocator=std::shared_ptr<MyAllocator<std_msgs::msg::UInt32_<std::allocator<void> > >> (use count 3, weak count 0) = {...})
    at /home/ubuntu/ros2_ws/install/rclcpp/include/rclcpp/experimental/intra_process_manager.hpp:219
#3  0x000055555558a1b1 in rclcpp::Publisher<std_msgs::msg::UInt32_<std::allocator<void> >, MyAllocator<void> >::do_intra_process_publish (
    this=0x5555556d1710, msg=std::unique_ptr<std_msgs::msg::UInt32_<std::allocator<void> >> = {...})
    at /home/ubuntu/ros2_ws/install/rclcpp/include/rclcpp/publisher.hpp:350
#4  0x000055555558723d in rclcpp::Publisher<std_msgs::msg::UInt32_<std::allocator<void> >, MyAllocator<void> >::publish (this=0x5555556d1710, 
    msg=std::unique_ptr<std_msgs::msg::UInt32_<std::allocator<void> >> = {...}) at /home/ubuntu/ros2_ws/install/rclcpp/include/rclcpp/publisher.hpp:206
#5  0x000055555557c964 in main (argc=2, argv=0x7fffffff2df8) at /home/ubuntu/ros2_ws/src/ros2/demos/demo_nodes_cpp/src/topics/allocator_tutorial.cpp:229

It looks like we are trying to do a std::move on an already null message in the intraprocess manager.

@ivanpauno
Copy link
Member

It looks like we are trying to do a std::move on an already null message in the intraprocess manager.

Moving a null shared_ptr isn't an issue though.
It's not clear from the traceback what the problem is.

@wjwwood
Copy link
Member

wjwwood commented Apr 21, 2021

I stepped through the code and things go sideways in the last function call:

(lldb)
Process 93528 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x00000001000b76db allocator_tutorial`void rclcpp::experimental::IntraProcessManager::do_intra_process_publish<std_msgs::msg::UInt32_<std::__1::allocator<void> >, MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > >(this=0x0000000101f0bd60, intra_process_publisher_id=3, message=unique_ptr<std_msgs::msg::UInt32_<std::__1::allocator<void> >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > > @ 0x00007ffeefbf8ed8, allocator=0x0000000102c13170) at intra_process_manager.hpp:220:9
   217 	        sub_ids.take_ownership_subscriptions.end());
   218
   219 	      this->template add_owned_msg_to_buffers<MessageT, Alloc, Deleter>(
-> 220 	        std::move(message),
   221 	        concatenated_vector,
   222 	        allocator);
   223 	    } else if (!sub_ids.take_ownership_subscriptions.empty() && // NOLINT
Target 0: (allocator_tutorial) stopped.
(lldb) p *message
(std_msgs::msg::UInt32_<std::__1::allocator<void> >) $28 = (data = 0)
(lldb) p concatendated_vector
error: use of undeclared identifier 'concatendated_vector'
(lldb) p concatenated_vector
(std::__1::vector<unsigned long long, std::__1::allocator<unsigned long long> >) $29 = size=1 {
  [0] = 4
}
(lldb) p allocator
(std::__1::allocator_traits<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > >::allocator_type) $30 = {}
(lldb) n
Process 93528 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x00000001000b76fd allocator_tutorial`void rclcpp::experimental::IntraProcessManager::do_intra_process_publish<std_msgs::msg::UInt32_<std::__1::allocator<void> >, MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > >(this=0x0000000101f0bd60, intra_process_publisher_id=3, message=unique_ptr<std_msgs::msg::UInt32_<std::__1::allocator<void> >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > > @ 0x00007ffeefbf8ed8, allocator=0x0000000102c13170) at intra_process_manager.hpp:221:9
   218
   219 	      this->template add_owned_msg_to_buffers<MessageT, Alloc, Deleter>(
   220 	        std::move(message),
-> 221 	        concatenated_vector,
   222 	        allocator);
   223 	    } else if (!sub_ids.take_ownership_subscriptions.empty() && // NOLINT
   224 	      sub_ids.take_shared_subscriptions.size() > 1)
Target 0: (allocator_tutorial) stopped.
(lldb)
Process 93528 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x00000001000b7707 allocator_tutorial`void rclcpp::experimental::IntraProcessManager::do_intra_process_publish<std_msgs::msg::UInt32_<std::__1::allocator<void> >, MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > >(this=0x0000000101f0bd60, intra_process_publisher_id=3, message=unique_ptr<std_msgs::msg::UInt32_<std::__1::allocator<void> >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > > @ 0x00007ffeefbf8ed8, allocator=0x0000000102c13170) at intra_process_manager.hpp:222:9
   219 	      this->template add_owned_msg_to_buffers<MessageT, Alloc, Deleter>(
   220 	        std::move(message),
   221 	        concatenated_vector,
-> 222 	        allocator);
   223 	    } else if (!sub_ids.take_ownership_subscriptions.empty() && // NOLINT
   224 	      sub_ids.take_shared_subscriptions.size() > 1)
   225 	    {
Target 0: (allocator_tutorial) stopped.
(lldb)
Process 93528 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x00000001000b7723 allocator_tutorial`void rclcpp::experimental::IntraProcessManager::do_intra_process_publish<std_msgs::msg::UInt32_<std::__1::allocator<void> >, MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > >(this=0x0000000101f0bd60, intra_process_publisher_id=3, message=unique_ptr<std_msgs::msg::UInt32_<std::__1::allocator<void> >, rclcpp::allocator::AllocatorDeleter<MyAllocator<std_msgs::msg::UInt32_<std::__1::allocator<void> > > > > @ 0x00007ffeefbf8ed8, allocator=0x0000000102c13170) at intra_process_manager.hpp:219:22
   216 	        sub_ids.take_ownership_subscriptions.begin(),
   217 	        sub_ids.take_ownership_subscriptions.end());
   218
-> 219 	      this->template add_owned_msg_to_buffers<MessageT, Alloc, Deleter>(
   220 	        std::move(message),
   221 	        concatenated_vector,
   222 	        allocator);
Target 0: (allocator_tutorial) stopped.
(lldb)
Process 93528 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x28)
    frame #0: 0x00000001000b67a0 allocator_tutorial`rclcpp::experimental::SubscriptionIntraProcess<std_msgs::msg::UInt32_<std::__1::allocator<void> >, std::__1::allocator<void>, std::__1::default_delete<std_msgs::msg::UInt32_<std::__1::allocator<void> > >, std_msgs::msg::UInt32_<std::__1::allocator<void> > >::provide_intra_process_message(this=0x0000000102c15660, message=nullptr) at subscription_intra_process.hpp:146:14
   143 	  void
   144 	  provide_intra_process_message(ConstMessageSharedPtr message)
   145 	  {
-> 146 	    buffer_->add_shared(std::move(message));
   147 	    trigger_guard_condition();
   148 	  }
   149
Target 0: (allocator_tutorial) stopped.
(lldb) p *message
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory

It seems like memory corruption to me. I'll keep poking at it too.

@clalancette
Copy link
Contributor Author

Just as an FYI; I commented out the code adding a custom allocator for the subscriber, and the crash went away. See 64a61a6 for what I mean.

I wonder, then, if this is involved with ros2/rclcpp#1324 . I'm going to have to give this investigation up for the day, but I think that is where I would look next.

@wjwwood
Copy link
Member

wjwwood commented Apr 28, 2021

I think ros2/rclcpp#1643 fixes this issue, but we may also want to do some version of https://github.com/ros2/rclcpp/tree/hidmic/workaround-allocator-crash (no pull request yet).

@wjwwood
Copy link
Member

wjwwood commented Apr 28, 2021

I should say, I've yet to confirm ros2/rclcpp#1643 fixes the issue, but I think it may. I'll check tomorrow.

@wjwwood
Copy link
Member

wjwwood commented Apr 28, 2021

Actually, I just managed to fix my issue (I had a dirty branch for the demos repo) and tested it. The allocator_tutorial intra now works for me with this patch in ros2/rclcpp#1643.

@wjwwood
Copy link
Member

wjwwood commented Apr 29, 2021

Closing with ros2/rclcpp#1643, though @hidmic is going to open another pull request to fix more undefined behavior.

@wjwwood wjwwood closed this as completed Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants