Skip to content
This repository was archived by the owner on Nov 13, 2017. It is now read-only.

Conversation

andreaskoepf
Copy link

We ran into #442 in our lab too and observed some very threatening and dangerous high-speed motions of our UR5 robotic arm. On a real industrial robot this issue is extremely safety-critical since it can lead to plans that start far away from the current robot joint position even though MoveGroup::setStartStateToCurrentState() has been called. I am personally quite shocked that this was not fixed by somebody else before since #442 is more than two years old.

Some background details about the problem as understood by us:

Calling MoveGroup::setStartStateToCurrentState() internally nulls the considered_start_state_ and sends a move-group planning request action goal with an empty start state (joint-names & positions arrays are emtpy). This makes MoveIt! directly use the RobotState which is stored inside the planning scene, e.g. when using OMPL ModelBasedPlanningContext::setCompleteInitialState() will be called from ompl_interface::PlanningContextManager::getPlanningContext() with a state generated by robot_state::RobotStatePtr start_state = planning_scene->getCurrentStateUpdated(req.start_state); .. when req.start_state is 'empty' this means effectively that the current state of the planning scene is is used.

Unfortunately the action server callback MoveGroupMoveAction::executeMoveCallback does not ensure that a robot state newer than the planning request is available. This leads to plans that use an outdated start state. Actually I am not quite sure how far back in time the invalid start state can be since from the motions that we saw it must have been significantly > 0.1s... maybe somebody from the MoveIt! team with more expertise in this specific area can explain why we saw such severe movements....

How we address the problem:

This PR makes sure that a new JointState message is received by CurrentStateMonitor and is indeed written to the RobotState of the PlanningScene before the planning request is being processed. In our tests this fixed the problem (which otherwise is quickly reproductible). In this PR the change is done only exemplary for the MoveGroupMoveAction .. other actions might be affected by an outdated current state too.

Since the MoveIt! source base is new to me and the project is quite large this PR is intended as a discussion basis on how to best address the problem. E.g. the sleep-based wait loop should better be replaced by a boost::condition_variable or some other more direct signalling mechanism, etc.

Special thanks to @alemme for debugging this issue together with me.

@simonschmeisser
Copy link
Contributor

It has surprised me as well a couple of times how badly synchronized stuff can be in MoveIt!. I somehow manage to forget after some time about all the sleep() calls I had to put into my code.

I like your idea about saving the last update time. However I think this should be stored in the RobotState itself. Ideally the timestamp of when the position information entered the ros system (ie when the joint position message arrived from the robot controller).

@JimmyDaSilva
Copy link

Works like a charm for me. Glad someone finally digged into this :)
Thanks !

@andreaskoepf
Copy link
Author

@simonschmeisser I would appreciate to have time stamps inside the RobotState itself - something similar to the values current_state_time_ and joint_time_ currently stored externally in CurrentStateMonitor, e.g. current_state_monitor.cpp#L312.

@130s
Copy link
Contributor

130s commented May 26, 2016

@andreaskoepf Could you rebase to the latest source of origin/indigo-devel? Your PR is based on outdated commits and due the the old .travis.yml Travis would probably never pass.

@andreaskoepf
Copy link
Author

@130s Rebasing would be no problem. My hope was that some MoveIt! expert would review the changes and give me some hints how to fix this in a more general way. E.g. I have not yet checked how planning scene changes (e.g. addition/removal of collision objects) is incorporated in the planning process and if that faces the same problem e.g. it might not be guaranteed that the most recent changes are considered during planning.

I can set this on my todo list and try to find 1-2 days of time to build up a deeper understanding about the code-structure and then come up with a cleaner solution. But I will not be able to do this before June 6th...

if (age < ros::Duration(0))
break;

if (i >= 200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a less intuitive iteration count, can you have a timeout variable in seconds? i know its basically the same thing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davetcoleman yes, sure. thx for having a look. Still had no time to do more complete analysis if integration of changes in the planning-scene suffers from a similar issue (I think I observed a couple of times that collision objects were not correcty added/removed before the the planning but that is not verified).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreaskoepf: see #686 for integrating changes to the planning scene synchronously.

@davetcoleman
Copy link
Member

I'm not super familiar with the architecture of this part of MoveIt! but overall it seems like a reasonable fix.

@alainsanguinetti
Copy link
Contributor

alainsanguinetti commented Jul 4, 2016

Maybe we should deprecate this method?
Could it be a problem with how people process the callbacks ? (not fast enough or maybe async spinner thread is preempted etc .. )
As @v4hn noted, I think #686 can solve the problem.
I think it is better to rely on an explicit synchronisation mechanism rather than adding some delay to hide that something was not fast enough or is not happening.

}

ros::Duration(0.001).sleep();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this whole block should only be executed if planning_scene::isEmpty(goal->request.start_state).
If you just want to plan with a complete state, there is no need to wait for a state update.
On the other hand, if the start_state is empty that means we should plan with the current state, so we should wait for the next update as you proposed @andreaskoepf .

@v4hn
Copy link
Contributor

v4hn commented Jul 4, 2016 via email

@alainsanguinetti
Copy link
Contributor

alainsanguinetti commented Jul 6, 2016

Hello,
I think the title of this PR is misleading because it pretends to fix the setStartToCurrentState method while it actually addresses issue #442 which is a much bigger problem which resides mainly in the move_group package.
I think we can do two things:

  • actually improve setStartToCurrentState by making it use current_state_monitor_.getCurrentState() which is quite easy and would make it possible for the requesting code to manage what is the start state.
  • fix the problem on the move_group node.

I think, as I said before, that it is a problem with how callbacks are processed. To show that, in move_group/move_group.cpp line 152 you can change:
ros::AsyncSpinner spinner( 1 );
to
ros::AsyncSpinner spinner( 0 );
which actually tells the spinner to use as many threads as there are cores on the computer. In my case, this solved the problem or at least significantly reduced it.
My explanation would be that when there is only one thread, if the callbacks corresponding to the action are processed by the spinner before the state update callbacks then it is impossible to have the state be updated before the action is processed.

In any case, changing from 1 to 0 improved the smoothness of the display so I think I'll open a PR just for that.

@rhaschke
Copy link
Contributor

rhaschke commented Jul 15, 2016

@alainsanguinetti raised an important point: Having only a single spinner thread, all traffic is processed strictly in order. There are some blocking services, e.g. MoveGroupExecuteService and maybe more, which take a while for processing. During that time, no other ROS messages are processed.

In case of a waiting MoveGroupExecuteService that means, that the service waits for the trajectory controller to finish, however, incoming joint state updates are not processed and pile up in the callback queue. Afterwards, it takes a while to process all of them.
For this reason the proposed fix doesn't solve the issue: Because updateSceneWithCurrentState stores the current time as last_robot_state_update_time_ but not the (eventually outdated) timestamp of the joint_state message, the waiting loop might stop to early, being satisfied with an outdated state.

To fix the problem in a more fundamental fashion, I suggest to modify planning_scene_monitor::LockedPlanningSceneRO such, that it waits until all scene updates with a timestamp < ros::Time::now() are processed and only then returns the current planning scene.
This indeed would require a condition variable to be efficient.

Another issue might be bad locking behaviour of the PlanningScene in PlanningSceneMonitor.
I observed a write-lock time of 0.277s for a state update. I don't yet fully understand the reason, but eventually the multiple number of read-only accessors block a write-accessor. In this case, a pending write-accessor should block all future read-accessors?

Increasing the number of spinner threads is a quick fix. However, we should carefully look for those blocking services and put them actively in threads (instead of running them in the spinner thread only).

@v4hn
Copy link
Contributor

v4hn commented Jul 15, 2016

On Thu, Jul 14, 2016 at 07:12:37PM -0700, Robert Haschke wrote:

@alainsanguinetti raised an important point: Having only a single spinner thread, all traffic is processed strictly in order.
There are some blocking services, e.g. MoveGroupExecuteService and maybe more, which take a while for processing. During that time, no other ROS messages are processed.

Yes, we should work on that front.

This does not solve #422 in theory though. The problem there is basically the same one as previously noticed with the ObjectRecognitionAction:
Traditionally the action server does not get to know exactly when it is supposed to start using data, i.e. input images.
https://groups.google.com/forum/#!msg/ros-sig-perception/j1JtB5dkkIU/tEF_kOZAU-oJ

We have the same problem with the planning scene/robot state to use for the action request, and I believe we should change that somehow.
Cleaning up threading in MoveIt does not resolve this design flaw, but will reduce the number of errors quite a lot to make it harder
to trigger the bug (which does not mean that it will not appear the next time you have to show a demo).

@rhaschke
Copy link
Contributor

This does not solve #422 in theory though. The problem there is basically the same one as previously noticed with the ObjectRecognitionAction:
Traditionally the action server does not get to know exactly when it is supposed to start using data, i.e. input images.
https://groups.google.com/forum/#!msg/ros-sig-perception/j1JtB5dkkIU/tEF_kOZAU-oJ

As far as I can see, Martin Günther proposes exactly the same concept as me: Make sure that the input data is current (here: all input to PSM previous to a timestamp is processed) w.r.t. the timestamp of the call of an action.
Indeed, currently there is no method to pass this timestamp explicitly. However, in case of PSM, I think
ros::Time::now() will be sufficient.

Please continue this discussion in #716.

@andreaskoepf
Copy link
Author

Please continue this discussion in #716.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants