Skip to content
This repository was archived by the owner on Nov 13, 2017. It is now read-only.

Conversation

rhaschke
Copy link
Contributor

@rhaschke rhaschke commented Jul 15, 2016

This wip branch is intended as joint collaboration branch to fix the race conditions in scene updates reported in #442. All developers are invited to contribute via additional PRs. I'm trying to centralize the discussion on this issue here.

Related PRs:

As pointed out in a comment of #671 the fundamental underlying reason might be that scene update messages pile up in the callback queue (due to sequential processing by a single spinner thread). Hence, not only pending joint state updates may be missed, but also other scene updates.
Hence I suggest to modify planning_scene_monitor::LockedPlanningSceneRO such that it waits until all scene updates with timestamp < ros::Time::now() are processed and only then returns the current planning scene. Only this will ensure, that all pending scene updates are considered at a given time.

Proposed concrete changes:

  • Have an own CallbackQueue and AsyncSpinner for a PlanningSceneMonitor (PSM) instance. This might conflict with Added support for nodes handles with specific callbacks queues #701.
  • Handle the actual incoming timestamps of planning scene updates. Currently, the throttling is performed based on current timestamps.
  • As this will conflict with throttling of (state) updates (requesting a LockedPlanningSceneRO will enforce to consider all pending updates), we need another mechanism for throttling. Essentially we have two use cases: 1. Another thread within move_group needs to read the current state and wants to ensure to get all pending updates. 2. The PSM regularly should publishes scene updates. If there are no other accesses to the PSM in between, there is no reason to perform state updates in between.

Before starting along that route, I'm waiting for positive feedback from some developers. And of course, I hope for active contributions. Please announce before you start to tackle a sub task.

@davetcoleman
Copy link
Member

I don't use the Rviz plugin and don't run into these race conditions in my normal preferred workflow so don't fully understand the details at play here, but I'm glad you are tackling this. Hopefully @v4hn can help. Perhaps @jbohren has some insight since he's improved MoveIt! race conditions in the past. I'll continue to do high-level reviews.

@rhaschke
Copy link
Contributor Author

Proof-of-concept implementation on top of #713.
Needs code cleanup! Please don't complain (yet) on code style ;-)

unlocking needs to be performed in reverse order of locking
otherwise deadlocks are risked
@rhaschke
Copy link
Contributor Author

@v4hn I cleaned up the commit history. Please review now.
I do now get an exception on destroying a static mutex when closing rviz. Cannot yet tell, where this comes from. Seems to be related to unloading of a (static?) CollisionPlugin.

@rhaschke
Copy link
Contributor Author

TODOs:
Check python interface. Can we go without sleeps now?
Check current state before executing trajectory: Is initial trajectory point == current state?

rhaschke added a commit that referenced this pull request Jul 19, 2016
- unlock mutexes in reverse order of locking
- reformatting
- PlanningSceneMonitor::syncSceneUpdates(ros::Time): sync all scene updates upto given time
- syncSceneUpdates() after execution of trajectories & before fetching current robot state
- sync scene update before starting planning
- validate trajectory before execution
@rhaschke
Copy link
Contributor Author

Python API works without sleeps like a charm.
Added state validity check to MoveGroup interface.
ABI compatibility by moving new class members to the end of the struct.
Ready to merge.

@davetcoleman
Copy link
Member

Can you rename the subject of the PR to something more descriptive than work-in-progress for future reference? and so to ensure its ready for merging ;-)

@davetcoleman
Copy link
Member

Also, with 8 commits in this PR its my experience that reviewers request they be merged into fewer (preferably 1).

ros::CallbackQueue callback_queue_;
boost::scoped_ptr<ros::AsyncSpinner> spinner_;
ros::Time last_robot_motion_time_; /// Last time the robot has moved
bool enforce_next_state_update_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its out of style to suddenly start aligning variable names, but I guess its not a big deal

@rhaschke rhaschke changed the title work-in-progress: fix #442 fix #442: race conditions when updating PlanningScene state Jul 19, 2016
@@ -139,7 +139,9 @@ void MotionPlanningFrame::computeExecuteButtonClicked()
if (move_group_ && current_plan_)
{
ui_->stop_button->setEnabled(true); // enable stopping
bool success = move_group_->execute(*current_plan_);
bool success =
move_group_->validatePlan(*current_plan_) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of chaining these, i think it would be much nicer to have a console output explaining why validatePlan failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the console output in validatePlan().

rhaschke added a commit that referenced this pull request Jul 21, 2016
Need to wait for planning scene updates to be received (by rviz, from move_group).
Due to throttling of planning scene updates in move_group's PSM, this might take a while.
rhaschke added a commit that referenced this pull request Jul 21, 2016
fixes raise conditions in updating of PlanningScene state
- unlock mutexes in reverse order of locking
- PlanningSceneMonitor::syncSceneUpdates(ros::Time): sync all scene updates upto given time
- syncSceneUpdates() after execution of trajectories & before fetching current robot state
- sync scene update before starting planning
- validate trajectory before execution
- final fixup #724
rhaschke added a commit that referenced this pull request Jul 21, 2016
// start our own spinner listening on our own callback_queue to become independent of any global callback queue
root_nh_.setCallbackQueue(&callback_queue_);
spinner_.reset(new ros::AsyncSpinner(1 /* threads */, &callback_queue_));
spinner_->start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alainsanguinetti Could you have a look at this commit w.r.t. your pull-request #701?
Starting a spinner in the monitor obviously takes control out of the user's hands again.
Even so, afaics the PlanningSceneMonitor always used the global callback queue, right?
I don't see a way to provide the nodehandles the psm uses...

I suppose at least we should provide the user with a way to override the control flow from the outside.
tf/tf2 implement a similar scheme and they use an optional parameter spin_thread in the constructor, but this breaks ABI.
Is it enough to add the optional parameter in kinetic and leave indigo with the background thread?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably a nice fix. Maybe we should allow the user to specify how many threads he wants to use.
The user does not really see more than this since it is encapsulated in the move_group node, so it is probably more a matter of how many more threads does he allow on his computer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Aug 10, 2016 at 08:59:31PM -0700, Alain Sanguinetti wrote:

I think this is probably a nice fix.

I see two problems with this:

Maybe we should allow the user to specify how many threads he wants to use.

I would prefer to have it fully optional. If, as an alternative, the user could specify a NodeHandle for which he guarantees
that it will receive asyncronous updates this would include your use-case.

@rhaschke
Copy link
Contributor Author

rhaschke commented Jul 22, 2016

@v4hn thanks for careful reviewing. I addressed - hopefully all - your comments in #724.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants