fix #442: race conditions when updating PlanningScene state #716

rhaschke · 2016-07-15T07:03:53Z

This wip branch is intended as joint collaboration branch to fix the race conditions in scene updates reported in #442. All developers are invited to contribute via additional PRs. I'm trying to centralize the discussion on this issue here.

Related PRs:

Trajectory start doesn't match Current Position, when using plan/execute #442 (reported issue)
Fix MoveGroup::setStartStateToCurrentState() (issue 442) #671 (active waiting work-around)
add ApplyPlanningSceneService capability #686 (synchronous update work-around).

As pointed out in a comment of #671 the fundamental underlying reason might be that scene update messages pile up in the callback queue (due to sequential processing by a single spinner thread). Hence, not only pending joint state updates may be missed, but also other scene updates.
Hence I suggest to modify planning_scene_monitor::LockedPlanningSceneRO such that it waits until all scene updates with timestamp < ros::Time::now() are processed and only then returns the current planning scene. Only this will ensure, that all pending scene updates are considered at a given time.

Proposed concrete changes:

Have an own CallbackQueue and AsyncSpinner for a PlanningSceneMonitor (PSM) instance. This might conflict with Added support for nodes handles with specific callbacks queues #701.
Handle the actual incoming timestamps of planning scene updates. Currently, the throttling is performed based on current timestamps.
As this will conflict with throttling of (state) updates (requesting a LockedPlanningSceneRO will enforce to consider all pending updates), we need another mechanism for throttling. Essentially we have two use cases: 1. Another thread within move_group needs to read the current state and wants to ensure to get all pending updates. 2. The PSM regularly should publishes scene updates. If there are no other accesses to the PSM in between, there is no reason to perform state updates in between.

Before starting along that route, I'm waiting for positive feedback from some developers. And of course, I hope for active contributions. Please announce before you start to tackle a sub task.

davetcoleman · 2016-07-16T00:17:48Z

I don't use the Rviz plugin and don't run into these race conditions in my normal preferred workflow so don't fully understand the details at play here, but I'm glad you are tackling this. Hopefully @v4hn can help. Perhaps @jbohren has some insight since he's improved MoveIt! race conditions in the past. I'll continue to do high-level reviews.

rhaschke · 2016-07-18T06:53:42Z

Proof-of-concept implementation on top of #713.
Needs code cleanup! Please don't complain (yet) on code style ;-)

unlocking needs to be performed in reverse order of locking otherwise deadlocks are risked

rhaschke · 2016-07-18T20:50:58Z

@v4hn I cleaned up the commit history. Please review now.
I do now get an exception on destroying a static mutex when closing rviz. Cannot yet tell, where this comes from. Seems to be related to unloading of a (static?) CollisionPlugin.

rhaschke · 2016-07-18T21:31:56Z

TODOs:
Check python interface. Can we go without sleeps now?
Check current state before executing trajectory: Is initial trajectory point == current state?

- unlock mutexes in reverse order of locking - reformatting - PlanningSceneMonitor::syncSceneUpdates(ros::Time): sync all scene updates upto given time - syncSceneUpdates() after execution of trajectories & before fetching current robot state - sync scene update before starting planning - validate trajectory before execution

rhaschke · 2016-07-19T21:28:02Z

Python API works without sleeps like a charm.
Added state validity check to MoveGroup interface.
ABI compatibility by moving new class members to the end of the struct.
Ready to merge.

davetcoleman · 2016-07-19T22:34:50Z

Can you rename the subject of the PR to something more descriptive than work-in-progress for future reference? and so to ensure its ready for merging ;-)

davetcoleman · 2016-07-19T22:36:24Z

Also, with 8 commits in this PR its my experience that reviewers request they be merged into fewer (preferably 1).

davetcoleman · 2016-07-19T22:38:37Z

planning/planning_scene_monitor/include/moveit/planning_scene_monitor/planning_scene_monitor.h

+  ros::CallbackQueue                    callback_queue_;
+  boost::scoped_ptr<ros::AsyncSpinner>  spinner_;
+  ros::Time                             last_robot_motion_time_; /// Last time the robot has moved
+  bool                                  enforce_next_state_update_;


its out of style to suddenly start aligning variable names, but I guess its not a big deal

davetcoleman · 2016-07-19T22:48:41Z

visualization/motion_planning_rviz_plugin/src/motion_planning_frame_planning.cpp

@@ -139,7 +139,9 @@ void MotionPlanningFrame::computeExecuteButtonClicked()
  if (move_group_ && current_plan_)
  {
    ui_->stop_button->setEnabled(true); // enable stopping
-    bool success = move_group_->execute(*current_plan_);
+    bool success =
+      move_group_->validatePlan(*current_plan_) &&


instead of chaining these, i think it would be much nicer to have a console output explaining why validatePlan failed

Added the console output in validatePlan().

…ates upto given time

…current robot state

Need to wait for planning scene updates to be received (by rviz, from move_group). Due to throttling of planning scene updates in move_group's PSM, this might take a while.

fixes raise conditions in updating of PlanningScene state - unlock mutexes in reverse order of locking - PlanningSceneMonitor::syncSceneUpdates(ros::Time): sync all scene updates upto given time - syncSceneUpdates() after execution of trajectories & before fetching current robot state - sync scene update before starting planning - validate trajectory before execution - final fixup #724

v4hn · 2016-07-21T20:43:16Z

planning/planning_scene_monitor/src/planning_scene_monitor.cpp

+  // start our own spinner listening on our own callback_queue to become independent of any global callback queue
+  root_nh_.setCallbackQueue(&callback_queue_);
+  spinner_.reset(new ros::AsyncSpinner(1 /* threads */, &callback_queue_));
+  spinner_->start();


@alainsanguinetti Could you have a look at this commit w.r.t. your pull-request #701?
Starting a spinner in the monitor obviously takes control out of the user's hands again.
Even so, afaics the PlanningSceneMonitor always used the global callback queue, right?
I don't see a way to provide the nodehandles the psm uses...

I suppose at least we should provide the user with a way to override the control flow from the outside.
tf/tf2 implement a similar scheme and they use an optional parameter spin_thread in the constructor, but this breaks ABI.
Is it enough to add the optional parameter in kinetic and leave indigo with the background thread?

I think this is probably a nice fix. Maybe we should allow the user to specify how many threads he wants to use.
The user does not really see more than this since it is encapsulated in the move_group node, so it is probably more a matter of how many more threads does he allow on his computer.

On Wed, Aug 10, 2016 at 08:59:31PM -0700, Alain Sanguinetti wrote:

I think this is probably a nice fix.

I see two problems with this:

It should definitely be optional(enabled by default) because some people do not want to have unmanaged threads running on their systems.
See your very own request Added support for nodes handles with specific callbacks queues #701 for a recent example.

For unclear reasons it has been explicitly disallowed to start ASyncSpinner's for different threads. See roscpp: Multiple AsyncSpinner behavior is confusing and precludes introspection ros/ros_comm#277
The commits in question are definitely faulty, but we are unsure about the details at the moment.
As a result this approach will break any code that instantiates a PlanningSceneMonitor in one thread and an(other) AsyncSpinner in a different thread.
One example for this is RViz.

An ugly workaround to this, that had been used in other places in MoveIt, is to require (without checking) a running AsyncSpinner in the background
when certain functions are called. See https://github.com/ros-planning/moveit/blob/kinetic-devel/moveit_ros/planning_interface/move_group_interface/include/moveit/move_group_interface/move_group.h#L619
If the bug in ros_comm would get fixed (also in indigo), we could remove that workaround.

Maybe we should allow the user to specify how many threads he wants to use.

I would prefer to have it fully optional. If, as an alternative, the user could specify a NodeHandle for which he guarantees
that it will receive asyncronous updates this would include your use-case.

rhaschke · 2016-07-22T01:10:45Z

@v4hn thanks for careful reviewing. I addressed - hopefully all - your comments in #724.

fixup #716

This reverts commit a8afa06, reversing changes made to 27e953e.

This reverts commit 8f39812, reversing changes made to ae51d07.

This was referenced Jul 15, 2016

Fix MoveGroup::setStartStateToCurrentState() (issue 442) #671

Closed

rviz plugin: stop execution + update of start state to current #713

Merged

rhaschke added a commit that referenced this pull request Jul 15, 2016

refer to new PR #716 for update issues

93825f4

rhaschke force-pushed the wip-fix-#442 branch from e698836 to 5a950c8 Compare July 18, 2016 06:48

fix order of unlocking

2326e33

unlocking needs to be performed in reverse order of locking otherwise deadlocks are risked

rhaschke force-pushed the wip-fix-#442 branch from 5a950c8 to 4434db8 Compare July 18, 2016 08:30

rhaschke mentioned this pull request Jul 18, 2016

fix trajectory service blocking callback queue #717

Merged

reformatting

b85e892

rhaschke force-pushed the wip-fix-#442 branch from 4434db8 to 5a5dade Compare July 18, 2016 20:27

davetcoleman reviewed Jul 19, 2016
View reviewed changes

rhaschke changed the title ~~work-in-progress: fix #442~~ fix #442: race conditions when updating PlanningScene state Jul 19, 2016

davetcoleman reviewed Jul 19, 2016
View reviewed changes

rhaschke added 2 commits July 20, 2016 03:14

PlanningSceneMonitor::syncSceneUpdates(ros::Time): sync all scene upd…

7488f4a

…ates upto given time

syncSceneUpdates() after execution of trajectories & before fetching …

f38a557

…current robot state

rhaschke added a commit that referenced this pull request Jul 21, 2016

fixup #716

36d7ce1

Need to wait for planning scene updates to be received (by rviz, from move_group). Due to throttling of planning scene updates in move_group's PSM, this might take a while.

rhaschke added a commit that referenced this pull request Jul 21, 2016

Merged cherry-picks #713, #717, #716, #724 from jade-devel

9acace1

v4hn reviewed Jul 21, 2016
View reviewed changes

v4hn added a commit that referenced this pull request Jul 25, 2016

Merge pull request #724 from ros-planning/fixup-#716

a8afa06

fixup #716

This was referenced Jul 26, 2016

Race conditions when updating PlanningScene: fixup #716 #728

Closed

set of fake controllers moveit/moveit_plugins#21

Merged

dornhege mentioned this pull request Aug 3, 2016

No planning scene updates in Rviz after #716. #736

Closed

v4hn referenced this pull request in v4hn/moveit_ros Aug 4, 2016

Revert "Merge pull request moveit#724 from ros-planning/fixup-#716"

b90c23a

This reverts commit a8afa06, reversing changes made to 27e953e.

v4hn added a commit to v4hn/moveit_ros that referenced this pull request Aug 4, 2016

Revert "Merge pull request moveit#716 from ros-planning/fix-#442"

6f28671

This reverts commit 8f39812, reversing changes made to ae51d07.

rhaschke mentioned this pull request Aug 17, 2016

validate trajectory before execution moveit/moveit#63

Merged

jspricke mentioned this pull request Aug 23, 2016

indigo-devel ABI incompabilities moveit/moveit#121

Closed

rhaschke mentioned this pull request Sep 20, 2016

fix race conditions when updating PlanningScene moveit/moveit#232

Closed

130s mentioned this pull request Jan 4, 2017

Trajectory start doesn't match Current Position, when using plan/execute #442

Closed

cbames mentioned this pull request Jan 22, 2018

waitForCurrentState(wait_time, ?planning_group=""?) moveit/moveit#722

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix #442: race conditions when updating PlanningScene state #716

fix #442: race conditions when updating PlanningScene state #716

Uh oh!

rhaschke commented Jul 15, 2016 •

edited

Loading

Uh oh!

davetcoleman commented Jul 16, 2016

Uh oh!

rhaschke commented Jul 18, 2016

Uh oh!

rhaschke commented Jul 18, 2016

Uh oh!

rhaschke commented Jul 18, 2016

Uh oh!

rhaschke commented Jul 19, 2016

Uh oh!

davetcoleman commented Jul 19, 2016

Uh oh!

davetcoleman commented Jul 19, 2016

Uh oh!

davetcoleman Jul 19, 2016

Uh oh!

davetcoleman Jul 19, 2016

Uh oh!

rhaschke Jul 20, 2016

Uh oh!

v4hn Jul 21, 2016

Uh oh!

alainsanguinetti Aug 11, 2016

Uh oh!

v4hn Aug 11, 2016

Uh oh!

rhaschke commented Jul 22, 2016 •

edited

Loading

Uh oh!

Uh oh!

fix #442: race conditions when updating PlanningScene state #716

fix #442: race conditions when updating PlanningScene state #716

Uh oh!

Conversation

rhaschke commented Jul 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davetcoleman commented Jul 16, 2016

Uh oh!

rhaschke commented Jul 18, 2016

Uh oh!

rhaschke commented Jul 18, 2016

Uh oh!

rhaschke commented Jul 18, 2016

Uh oh!

rhaschke commented Jul 19, 2016

Uh oh!

davetcoleman commented Jul 19, 2016

Uh oh!

davetcoleman commented Jul 19, 2016

Uh oh!

davetcoleman Jul 19, 2016

Choose a reason for hiding this comment

Uh oh!

davetcoleman Jul 19, 2016

Choose a reason for hiding this comment

Uh oh!

rhaschke Jul 20, 2016

Choose a reason for hiding this comment

Uh oh!

v4hn Jul 21, 2016

Choose a reason for hiding this comment

Uh oh!

alainsanguinetti Aug 11, 2016

Choose a reason for hiding this comment

Uh oh!

v4hn Aug 11, 2016

Choose a reason for hiding this comment

Uh oh!

rhaschke commented Jul 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rhaschke commented Jul 15, 2016 •

edited

Loading

rhaschke commented Jul 22, 2016 •

edited

Loading