-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test case and fix for for https://github.com/ros2/rclcpp/issues/2652 #2713
base: rolling
Are you sure you want to change the base?
Conversation
6906974
to
660ffaa
Compare
@luca-della-vedova @fmrico can you test this fix ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried on the reproduction repo and this branch indeed fixes it!
660ffaa
to
e3142d0
Compare
Sure!! testing... |
I have tested this PR against:
Everything is working now: LGTM 🔥 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmachowinski lgtm with green CI.
i would like to confirm one thing. how come do you consider this breaks ABI by the way? Templated functions are instantiated at compile-time, and their changes do not affect the binary interface (ABI) of the library.
This specific case is likely not to break ABI, but leaves you in a bad place. Depending on what was compiled at which time, you end up with the fix or not. E.g. you have libraries, that use the Executor therefore they contain a symbol of the Executor. These libraries were compiled before the fix. Even though you now compile with the new fixed template, the linker will choose one symbol to use later on during linking, and may use the 'broken' version from the library. In general, you can break ABI with templates, but you need to have a libraries. To break ABI, you need a to have a templated type, that changed its memory layout due to an alteration of the template. If this type is used by library A, but this was compiled against the old version of the template, and you now try to use the library with the new template, you get an ABI break, and most likely a crash... |
As we discussed in PMC, this doesn't really break ABI, but updating rclcpp by itself is insufficient to get the fix. You must also recompile any downstream users of the template. It's enough to justify a notification, and we are planning on doing a discourse post about it. |
Pulls: #2713 |
Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: Francisco Martín Rico <[email protected]>
Signed-off-by: Janosch Machowinski <[email protected]>
78d859d
to
27b2d0e
Compare
Pulls: #2713 |
just in case i started the CI after rebasing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I had one question that might lead to an edit
@@ -216,6 +216,11 @@ class DynamicStorage : public rclcpp::wait_set_policies::detail::StoragePolicyCo | |||
shared_waitables_ | |||
); | |||
|
|||
if(this->needs_pruning_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is weird to me that it passed the linters, I think they may be broken. I'll look into that, but it shouldn't block this pr, we can fix it up in a subsequent pr if that's the case I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I use the ament_uncrustify and ament_cpplint things, like I use clang_format.
Meaning, I just write code using 'my' coding style and let the auto formaters do their thing later on, and don't look at the format any more after the uncrustify // cpplint.
Is there a better tool for my workflow, that does not produce these strange artifacts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is the same with what i usually do, if there are any coding style problem, colcon test
(or CI) should say something... i tried this my local environment, all green too. i am not sure what happening here, some rules are changed??? @ros2/team any thoughts?
if(this->needs_pruning_) { | ||
this->storage_prune_deleted_entities(); | ||
this->needs_pruning_ = false; | ||
} | ||
|
||
this->storage_release_ownerships(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check and the pruning happen after releasing ownerships? It seems like holding the ownership by the storage might cause the weak pointers to have not expired even though all the user's shared pointers have been destroyed? Maybe it gets caught on the next run through?
I suppose this would only happen if something lost all ownership (except the wait set's) after needs_pruning_
became true but before this step.
Either way, since storage_prune_deleted_entities()
iterates through all of the wait set's weak pointers and checks them for expired, we might as well release ownerships first no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to happen while holding the ownership, in order to be sure that the indices still match. If you release ownership, and prune, and some references were dropped in between, the indices of the rcl waitset will not match any more the one in the storage. Basically leading to the same bug as before.
I don't like this implicit matching of indixes in general. I would like to introduce a backmapping datastructure later on, that uses the indices returned by the rcl_add functions
"StaticStorage : storage_rebuild_rcl_wait_set: Detected" | ||
" invalid entity in static entity storage"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested edit, don't block the pr on this:
"StaticStorage : storage_rebuild_rcl_wait_set: Detected" | |
" invalid entity in static entity storage"); | |
"StaticStorage::storage_rebuild_rcl_wait_set(): entity weak_ptr " | |
"unexpectedly expired in static entity storage"); |
This is a test case and fix for the bug #2652