Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"on_deactivate()" seems not called when setting the controller to be "inactive" #239

Closed
xibeisiber opened this issue Feb 13, 2023 · 10 comments

Comments

@xibeisiber
Copy link

Hi,
I tried diffbot.launch.py and rrbot_system_position_only.launch.py. The printing information in on_activate() is printed out in console, but when I "ros2 control set_controller_state forward_position_controller inactive" and "ros2 control set_controller_state diffbot_base_controller inactive", the printing information in "on_deactivate()" is not printed in console.

Another question is: will on_deactivate() be automatically called when I terminate the startup node (press ctrl+c indiffbot.launch.py)?

Thanks!
Jia

@christophfroehlich
Copy link
Contributor

Hi,
I think this is not an issue/question of this demo-repository but of https://github.com/ros-controls/ros2_control directly.

My experience on the termination with ctrl+c is that it is not really reproducible what happens (independent of ros2_control) ;)

@destogl
Copy link
Member

destogl commented Feb 20, 2023

I tried diffbot.launch.py and rrbot_system_position_only.launch.py. The printing information in on_activate() is printed out in console, but when I "ros2 control set_controller_state forward_position_controller inactive" and "ros2 control set_controller_state diffbot_base_controller inactive", the printing information in "on_deactivate()" is not printed in console.

Oh, this is very strange. this should definitely not happen. This part of ros2_control is well tested. I would not expect bugs there, nevertheless it is possible. I will test this when testing new restructured examples in #232 to be sure. Can you check the example one from this PR and report if you get the same behavior?

Another question is: will on_deactivate() be automatically called when I terminate the startup node (press ctrl+c indiffbot.launch.py)?

I cannot answer you in detail – I know that we had a discussion about this and about catching events. But for what I know, we are not doing a graceful “shutdown” of the framework. It is definitely something to think about, but until now, it wasn't really an issue in all the robots we have tested this.

This functionality has to be shaped well since it would include all standard and user-components to have pre-defined behavior.

@xibeisiber
Copy link
Author

I tried diffbot.launch.py and rrbot_system_position_only.launch.py. The printing information in on_activate() is printed out in console, but when I "ros2 control set_controller_state forward_position_controller inactive" and "ros2 control set_controller_state diffbot_base_controller inactive", the printing information in "on_deactivate()" is not printed in console.

Oh, this is very strange. this should definitely not happen. This part of ros2_control is well tested. I would not expect bugs there, nevertheless it is possible. I will test this when testing new restructured examples in #232 to be sure. Can you check the example one from this PR and report if you get the same behavior?

Another question is: will on_deactivate() be automatically called when I terminate the startup node (press ctrl+c indiffbot.launch.py)?

I cannot answer you in detail – I know that we had a discussion about this and about catching events. But for what I know, we are not doing a graceful “shutdown” of the framework. It is definitely something to think about, but until now, it wasn't really an issue in all the robots we have tested this.

This functionality has to be shaped well since it would include all standard and user-components to have pre-defined behavior.

Yes. I can reproduce the behavior and find a new problem..
Environment: ros2 humble, ubuntu 22.04, x86_64.
The ros2_control_demo is in commit 1007b1d.

I comment the print information of read()&write() in diffbot_system.cpp. When launching diffbot.launch.py, the info in on_activate() is printed out.

When ros2 control set_controller_state diffbot_base_controller inactive, the info in on_deactivate() is not printed out. But this controller is successfully disactived by "ros2 control list_controllers".

When ros2 control set_controller_state diffbot_base_controller active again, the node dies. error message is:

[INFO] [spawner-4]: process has finished cleanly [pid 17612]
[ros2_control_node-1] Stack trace (most recent call last) in thread 17583:
[ros2_control_node-1] #6    Object "", at 0xffffffffffffffff, in 
[ros2_control_node-1] #5    Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x7f62867269ff]
[ros2_control_node-1] #4    Source "./nptl/pthread_create.c", line 442, in start_thread [0x7f6286694b42]
[ros2_control_node-1] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f6286adc2b2, in 
[ros2_control_node-1] #2    Object "/opt/ros/humble/lib/controller_manager/ros2_control_node", at 0x555b4ec873db, in 
[ros2_control_node-1] #1    Object "/opt/ros/humble/lib/libcontroller_manager.so", at 0x7f6286efe7a7, in controller_manager::ControllerManager::update(rclcpp::Time const&, rclcpp::Duration const&)
[ros2_control_node-1] #0    Object "/opt/ros/humble/lib/libdiff_drive_controller.so", at 0x7f627c3baf89, in diff_drive_controller::DiffDriveController::update(rclcpp::Time const&, rclcpp::Duration const&)
[ros2_control_node-1] Segmentation fault (Signal sent by the kernel [(nil)])
[ERROR] [ros2_control_node-1]: process has died [pid 17555, exit code -11, cmd '/opt/ros/humble/lib/controller_manager/ros2_control_node --ros-args --params-file /tmp/launch_params__3imcxhq --params-file /home/jia/Workspace/ros2_ws/install/ros2_control_demo_bringup/share/ros2_control_demo_bringup/config/diffbot_controllers.yaml'].

@christophfroehlich
Copy link
Contributor

christophfroehlich commented Mar 5, 2023

I can reproduce your behavior.

Honestly I'm not sure about the lifecycle of the hardware interface, should it be activated/deactivated automatically if doing so with the controllers? @destogl?

The segmentation fault happens only with the diffbot example. I do not see a difference from rrbot with diffbot hardware from this repository, according to the stack trace (backward_ros ftw) the diff_drive_controller throws the error caused from a nullptr? I'll have a look if there is a difference in the lifecycle of the different controllers.

[ros2_control_node-1] Stack trace (most recent call last) in thread 16525:
[ros2_control_node-1] #6    Object "", at 0xffffffffffffffff, in 
[ros2_control_node-1] #5    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f662cee09ff, in 
[ros2_control_node-1] #4    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f662ce4eb42, in 
[ros2_control_node-1] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f662d0de2b2, in 
[ros2_control_node-1] #2    Source "/workspaces/ros2_rolling_ws/src/ros2_control/controller_manager/src/ros2_control_node.cpp", line 81, in operator() [0x55f1261e40d0]
[ros2_control_node-1]          79:         // execute update loop
[ros2_control_node-1]          80:         cm->read(cm->now(), measured_period);
[ros2_control_node-1]       >  81:         cm->update(cm->now(), measured_period);
[ros2_control_node-1]          82:         cm->write(cm->now(), measured_period);
[ros2_control_node-1]          83: 
[ros2_control_node-1]          84:         // wait until we hit the end of the period
[ros2_control_node-1] #1    Source "/workspaces/ros2_rolling_ws/src/ros2_control/controller_manager/src/controller_manager.cpp", line 1741, in update [0x7f662d4f3957]
[ros2_control_node-1]        1739:       if (controller_go)
[ros2_control_node-1]        1740:       {
[ros2_control_node-1]       >1741:         auto controller_ret = loaded_controller.c->update(
[ros2_control_node-1]        1742:           time, (controller_update_rate != update_rate_ && controller_update_rate != 0)
[ros2_control_node-1]        1743:                   ? rclcpp::Duration::from_seconds(1.0 / controller_update_rate)
[ros2_control_node-1]        1744:                   : period);
[ros2_control_node-1] #0  | Source "/workspaces/ros2_rolling_ws/src/ros2_controllers/diff_drive_controller/src/diff_drive_controller.cpp", line 249, in set_value
[ros2_control_node-1]     |   247:   for (size_t index = 0; index < static_cast<size_t>(params_.wheels_per_side); ++index)
[ros2_control_node-1]     |   248:   {
[ros2_control_node-1]     | > 249:     registered_left_wheel_handles_[index].velocity.get().set_value(velocity_left);
[ros2_control_node-1]     |   250:     registered_right_wheel_handles_[index].velocity.get().set_value(velocity_right);
[ros2_control_node-1]     |   251:   }
[ros2_control_node-1]     | Source "/workspaces/ros2_rolling_ws/src/ros2_control/hardware_interface/include/hardware_interface/loaned_command_interface.hpp", line 66, in set_value
[ros2_control_node-1]     |    64:   const std::string & get_prefix_name() const { return command_interface_.get_prefix_name(); }
[ros2_control_node-1]     |    65: 
[ros2_control_node-1]     | >  66:   void set_value(double val) { command_interface_.set_value(val); }
[ros2_control_node-1]     |    67: 
[ros2_control_node-1]     |    68:   double get_value() const { return command_interface_.get_value(); }
[ros2_control_node-1]       Source "/workspaces/ros2_rolling_ws/src/ros2_control/hardware_interface/include/hardware_interface/handle.hpp", line 111, in update [0x7f662417a0f1]
[ros2_control_node-1]         109:   void set_value(double value)
[ros2_control_node-1]         110:   {
[ros2_control_node-1]       > 111:     THROW_ON_NULLPTR(this->value_ptr_);
[ros2_control_node-1]         112:     *this->value_ptr_ = value;
[ros2_control_node-1]         113:   }
[ros2_control_node-1]         114: };
[ros2_control_node-1] Segmentation fault (Signal sent by the kernel [(nil)])

I used this commit https://github.com/christophfroehlich/ros2_control_demos/tree/issue_239 and the current rolling branches of ros2_control and ros2_controllers

@christophfroehlich
Copy link
Contributor

christophfroehlich commented May 2, 2023

I tried to investigate the segmentation fault, but did not find the source.
What I did: I extended the diff_drive_controller test, but this one does not throw any errors.

  state = controller_->get_node()->deactivate();
  ASSERT_EQ(state.id(), State::PRIMARY_STATE_INACTIVE);
  ASSERT_EQ(
    controller_->update(rclcpp::Time(0, 0, RCL_ROS_TIME), rclcpp::Duration::from_seconds(0.01)),
    controller_interface::return_type::OK);

  // activate again
  state = controller_->get_node()->activate();
  ASSERT_EQ(State::PRIMARY_STATE_ACTIVE, state.id());

  publish(linear, angular);
  // wait for msg is be published to the system
  ASSERT_TRUE(controller_->wait_for_twist(executor));
  ASSERT_EQ(
    controller_->update(rclcpp::Time(0, 0, RCL_ROS_TIME), rclcpp::Duration::from_seconds(0.01)),
    controller_interface::return_type::OK);

How can the set_value be called but the this-pointer be obsolete? The on_activate/on_deactivate methods of the diffbot_system do not change the pointers. Anyhow, the problem seems to be with the handles, and those are not managed by the hardware-interface of ros2_control_demo_example_2.

@destogl or @bmagyar: Any other suggestions how to narrow that down?

@destogl
Copy link
Member

destogl commented May 4, 2023

@christophfroehlich

We have here potentially multiple issues.

  1. DiffDriveController is not deactivated, but the hardware is deactivated. This causes nullptr – it seems that an interface or its value get invalidated in on_activate() method. (I would bet this is the issue) - in your test you are missing deactivation of controller.
  2. Automatic deactivation of controller is not implemented when hardware gets deactivated. Presently, this will not even be detected since the interface is still there is hardware is inactive.
  3. The lifecycle of hardware should work as expected. One should check if override of the methods is done correctly. Nevertheless, we might be missing something here… can you check the tests and see if they should be extended to detect if an implementation of SystemInterface::on_deactivate is actually called?

@destogl
Copy link
Member

destogl commented May 4, 2023

How can the set_value be called but the this-pointer be obsolete? The on_activate/on_deactivate methods of the diffbot_system do not change the pointers. Anyhow, the problem seems to be with the handles, and those are not managed by the hardware-interface of ros2_control_demo_example_2.

This sounds like quite a deep problem. Maybe something in the Resource Manager when we change interface flags is not right. But we are not moving them or invalidating them.

Do we have a test for this in Resource Manager?

@Noel215
Copy link
Contributor

Noel215 commented May 4, 2023

Hi, I've created that PR for ros2_controllers that solves the issue for me, please check it out. :)

@christophfroehlich
Copy link
Contributor

  1. DiffDriveController is not deactivated, but the hardware is deactivated. This causes nullptr – it seems that an interface or its value get invalidated in on_activate() method. (I would bet this is the issue) - in your test you are missing deactivation of controller.

@destogl to clarify this: the segmentation fault happens when the controller is deactivated while the hw component being still active (there are two topics mixed in this issue). I'll have a look on the PR of @Noel215, I fear I still don't understand the handle registration sufficiently.

  1. Automatic deactivation of controller is not implemented when hardware gets deactivated. Presently, this will not even be detected since the interface is still there is hardware is inactive.

ok

  1. The lifecycle of hardware should work as expected. One should check if override of the methods is done correctly. Nevertheless, we might be missing something here… can you check the tests and see if they should be extended to detect if an implementation of SystemInterface::on_deactivate is actually called?

We don't have any tests for the hardware components in this repo yet, see #46! do you have a hint to examples of tests we should add here?

@christophfroehlich
Copy link
Contributor

I'll close this issue now, summary from the initial post of @xibeisiber :

lets discuss the other points in the linked issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants