Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test case and fix for for https://github.com/ros2/rclcpp/issues/2652 #2713

Open
wants to merge 5 commits into
base: rolling
Choose a base branch
from

Conversation

jmachowinski
Copy link
Contributor

@jmachowinski jmachowinski commented Dec 17, 2024

This is a test case and fix for the bug #2652

@jmachowinski jmachowinski changed the title Test case for https://github.com/ros2/rclcpp/issues/2652 Test case and fix for for https://github.com/ros2/rclcpp/issues/2652 Dec 17, 2024
@jmachowinski
Copy link
Contributor Author

@luca-della-vedova @fmrico can you test this fix ?

@jmachowinski jmachowinski marked this pull request as ready for review December 17, 2024 17:20
Copy link
Contributor

@luca-della-vedova luca-della-vedova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried on the reproduction repo and this branch indeed fixes it!

@fmrico
Copy link
Contributor

fmrico commented Dec 19, 2024

@luca-della-vedova @fmrico can you test this fix ?

Sure!! testing...

@fmrico
Copy link
Contributor

fmrico commented Dec 19, 2024

I have tested this PR against:

Everything is working now: LGTM 🔥 🚀

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski lgtm with green CI.

i would like to confirm one thing. how come do you consider this breaks ABI by the way? Templated functions are instantiated at compile-time, and their changes do not affect the binary interface (ABI) of the library.

rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved
rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved
rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved
rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved
rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved
rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved
rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved
@jmachowinski
Copy link
Contributor Author

@jmachowinski lgtm with green CI.

i would like to confirm one thing. how come do you consider this breaks ABI by the way? Templated functions are instantiated at compile-time, and their changes do not affect the binary interface (ABI) of the library.

This specific case is likely not to break ABI, but leaves you in a bad place. Depending on what was compiled at which time, you end up with the fix or not. E.g. you have libraries, that use the Executor therefore they contain a symbol of the Executor. These libraries were compiled before the fix. Even though you now compile with the new fixed template, the linker will choose one symbol to use later on during linking, and may use the 'broken' version from the library.

In general, you can break ABI with templates, but you need to have a libraries. To break ABI, you need a to have a templated type, that changed its memory layout due to an alteration of the template. If this type is used by library A, but this was compiled against the old version of the template, and you now try to use the library with the new template, you get an ABI break, and most likely a crash...

@mjcarroll
Copy link
Member

As we discussed in PMC, this doesn't really break ABI, but updating rclcpp by itself is insufficient to get the fix. You must also recompile any downstream users of the template. It's enough to justify a notification, and we are planning on doing a discourse post about it.

@fujitatomoya
Copy link
Collaborator

Pulls: #2713
Gist: https://gist.githubusercontent.com/fujitatomoya/6af50bf834f3f8281e47c44d8829eba0/raw/535be838d71dec82d719fa76bfd467d8419dd27a/ros2.repos
BUILD args: --packages-above-and-dependencies rclcpp
TEST args: --packages-above rclcpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/15047

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Janosch Machowinski and others added 5 commits January 10, 2025 16:16
@fujitatomoya
Copy link
Collaborator

Pulls: #2713
Gist: https://gist.githubusercontent.com/fujitatomoya/42828e7e96e4dbe26cbb029b0587a9ac/raw/535be838d71dec82d719fa76bfd467d8419dd27a/ros2.repos
BUILD args: --packages-above-and-dependencies rclcpp
TEST args: --packages-above rclcpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/15051

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Collaborator

just in case i started the CI after rebasing.

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but I had one question that might lead to an edit

@@ -216,6 +216,11 @@ class DynamicStorage : public rclcpp::wait_set_policies::detail::StoragePolicyCo
shared_waitables_
);

if(this->needs_pruning_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is weird to me that it passed the linters, I think they may be broken. I'll look into that, but it shouldn't block this pr, we can fix it up in a subsequent pr if that's the case I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I use the ament_uncrustify and ament_cpplint things, like I use clang_format.
Meaning, I just write code using 'my' coding style and let the auto formaters do their thing later on, and don't look at the format any more after the uncrustify // cpplint.
Is there a better tool for my workflow, that does not produce these strange artifacts ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is the same with what i usually do, if there are any coding style problem, colcon test (or CI) should say something... i tried this my local environment, all green too. i am not sure what happening here, some rules are changed??? @ros2/team any thoughts?

Comment on lines +219 to 224
if(this->needs_pruning_) {
this->storage_prune_deleted_entities();
this->needs_pruning_ = false;
}

this->storage_release_ownerships();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check and the pruning happen after releasing ownerships? It seems like holding the ownership by the storage might cause the weak pointers to have not expired even though all the user's shared pointers have been destroyed? Maybe it gets caught on the next run through?

I suppose this would only happen if something lost all ownership (except the wait set's) after needs_pruning_ became true but before this step.

Either way, since storage_prune_deleted_entities() iterates through all of the wait set's weak pointers and checks them for expired, we might as well release ownerships first no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to happen while holding the ownership, in order to be sure that the indices still match. If you release ownership, and prune, and some references were dropped in between, the indices of the rcl waitset will not match any more the one in the storage. Basically leading to the same bug as before.
I don't like this implicit matching of indixes in general. I would like to introduce a backmapping datastructure later on, that uses the indices returned by the rcl_add functions

Comment on lines +169 to +170
"StaticStorage : storage_rebuild_rcl_wait_set: Detected"
" invalid entity in static entity storage");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit, don't block the pr on this:

Suggested change
"StaticStorage : storage_rebuild_rcl_wait_set: Detected"
" invalid entity in static entity storage");
"StaticStorage::storage_rebuild_rcl_wait_set(): entity weak_ptr "
"unexpectedly expired in static entity storage");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants