Skip to content
This repository has been archived by the owner on Nov 13, 2017. It is now read-only.

#592 fixed by manually resetting the kinematics loader #702

Conversation

alainsanguinetti
Copy link
Contributor

Issue #592 was very annoying and prevented properly exiting from a running program.

I think this solves the issue, it does for my quite complex software and for the very simple example provided by @fontysrobotics.

Valgrind seem to have some complaints but:

  • it is still better to allow people to properly cleanup their own data than throw this exception and have memory leaks
  • valgrind always complains with ros, log4cpp for example
  • I can't tell if this is valid complain from valgrind or not.

Please let me know if I need to provide additionnal information.

model_.reset();
rdf_loader_.reset();
kinematics_loader_.reset();

ROS_DEBUG( "Robot model loader destructor" );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think all these debug statements are needed, can you namespace them - ROS_DEBUG_NAMED("robot_model_loader", ...) so that I can hide them as needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know if they are needed, I used them to find the bug so I thought they might be of use to someone in the future.

Copy link
Contributor

@v4hn v4hn Jul 1, 2016 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@alainsanguinetti alainsanguinetti Jul 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll clean this PR to be more simple and focused on the fix.
I'll try to explain why it works but it seems to be a complex issue on memory management and boost shared pointers and all that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think some user feedback and debug messages are useful, and they are easy to turn off using the rosconsole yaml configuration files. This is an underutilized feature of ROS, so I wrote up a blog post a while back about it: http://dav.ee/blog/notes/archives/898

However in this PR I think there are way too many. @v4hn is right that probably none of them are needed.

alainsanguinetti pushed a commit to alainsanguinetti/moveit_ros that referenced this pull request Jul 4, 2016
alainsanguinetti pushed a commit to alainsanguinetti/moveit_ros that referenced this pull request Jul 4, 2016
@davetcoleman
Copy link
Member

Looks good, except can you rebase your commits to just one please?

@v4hn
Copy link
Contributor

v4hn commented Jul 5, 2016

I am investigating this issue at the moment and it seems to be either a bug in pluginlib or in our usage of it (or both).
I don't like the solution found here. I'll report back once I have a working diff or I give up. :)

@alainsanguinetti alainsanguinetti force-pushed the iss592_class_loader_exception branch from 2da142f to eec9de6 Compare July 6, 2016 02:06
@alainsanguinetti
Copy link
Contributor Author

@davetcoleman here is the rebase.
@v4hn I agree that my solution looks a bit like a workaround :-) I think it's a combination of boost shared pointer and classloader destructors happening in the wrong order or classloader happening too late when the library is already unloaded by the program ?

@alainsanguinetti alainsanguinetti force-pushed the iss592_class_loader_exception branch from eec9de6 to 383a895 Compare July 6, 2016 03:05
@rhaschke
Copy link
Contributor

rhaschke commented Jul 6, 2016

@v4hn @alainsanguinetti
The fundamental issue underlying #592 is indeed a design flaw of class_loader that I noticed a while ago: Both, class_loader and MoveIt use some static variables (in case of MoveIt the SharedStorage), whose release order is not well defined. This causes the ClassLoader stuff to be freed before SharedStorage, which attempts to access the former memory when ShardStorage is freed.
Actually, ClassLoader shouldn't free it's stuff as long as there are any pointers pending. However, this requires a rewrite of ClassLoader...

I'm not really sure, how the proposed fix here, actually can solve the issue: Resetting the pointer is exactly what the default destructor would do as well, isn't it?

@alainsanguinetti
Copy link
Contributor Author

@rhaschke I think the behaviour you mention is what class_loader is supposed to do.
Plus I think there is a difference between manually resetting a boost shared pointer and letting the default destructor delete. I still use normal pointer to avoid that kind of complexity.

@rhaschke
Copy link
Contributor

rhaschke commented Jul 6, 2016

Indeed, ClassLoader shouldn't unload the library as long as it's still in use. Using MoveIt's rviz plugin, I actually get an appropriate warning message when unloading the plugin.
However, here we get an error that ClassLoader didn't loaded the library and thus isn't aware of it. I'm not sure where this originates from. Actually the kinematics loader was loaded via the ClassLoader mechanism...

@v4hn
Copy link
Contributor

v4hn commented Jul 6, 2016

As far as my investigation goes, resetting the shared_ptr explicitly changes the order of destruction.
Normally the destructor of a shared_ptr'ed object is called after the object
and all of its members are destructed. So resetting the shared_ptr beforehand
moves the destructor call of the loader before the destruction of all other members.

@v4hn v4hn mentioned this pull request Jul 6, 2016
@v4hn
Copy link
Contributor

v4hn commented Jul 6, 2016

@alainsanguinetti could you please test #705 to see if this also resolves the problem?
While it also does not address the real problem, it at least removes unnecessary code instead of adding more code.

After debugging it for some more time and reading through @rhaschke's analysis, this is my understanding of the issue:
MoveIt! uses static variables in SharedStorage and class_loader uses a number of static variables in class_loader::class_loader_private.
Static function-local variables seem to be destroyed in the inverse order of the respective functions being called for the first time.
This results in at least the mutex returned by getLoadedLibraryVectorMutex (likely most class_loader_private variables) being destructed before MoveIt!'s SharedStorage.
However, only when destructing SharedStorage, the KinematicsPluginLoader object is destructed and only then pluginlib tries to unload the libraries.
Thus, this unloading operates on officially destructed objects.
This happens with either patch.
However, depending on the exact order of destructors, the data still seems to be (deterministically) usable in some cases, but scrambled in others.
This is a severe shortcoming in the architecture of class_loader, and it would be great if someone could fix it there: The unloading of libraries (again as @rhaschke mentioned in an issue there) should not be tied to the lifetime of class_loader but to the respective instances of the libraries classes too.

@v4hn
Copy link
Contributor

v4hn commented Jul 6, 2016

Leaving aside #705 that I believe to be reasonable either way, I can see only one way out of this problem that does not imply rewriting class_loader (anyone is welcome to do this for upcoming ROS releases, I suppose).
We could move away from automatic destruction of the SharedStorage and explicitly destruct it somewhere instead. However, that somewhat defies the purpose, because the only place I can think of is in the destructor of MoveGroupImpl et.al.
Even with some shared_ptr construction the Storage would not be shared anymore when MoveGroup's are being created and destroyed over and over again ...

@rhaschke
Copy link
Contributor

rhaschke commented Jul 6, 2016

As mentioned also by @v4hn, #705 will not fix the issue. The only workaround, I can see for now, is to disable destruction of SharedStorage at all. This leaves a memory leak, but I guess that's fine, because until destruction of the process, the shared storage should be maintained anyway...
See #706 for a PR.

@alainsanguinetti
Copy link
Contributor Author

PR #705 is a better fix than this one so I close this one to avoid confusion.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants