Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test case and fix for for https://github.com/ros2/rclcpp/issues/2652 #2713

Open
wants to merge 5 commits into
base: rolling
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions rclcpp/include/rclcpp/wait_set_policies/dynamic_storage.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,11 @@ class DynamicStorage : public rclcpp::wait_set_policies::detail::StoragePolicyCo
shared_waitables_
);

if(this->needs_pruning_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is weird to me that it passed the linters, I think they may be broken. I'll look into that, but it shouldn't block this pr, we can fix it up in a subsequent pr if that's the case I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I use the ament_uncrustify and ament_cpplint things, like I use clang_format.
Meaning, I just write code using 'my' coding style and let the auto formaters do their thing later on, and don't look at the format any more after the uncrustify // cpplint.
Is there a better tool for my workflow, that does not produce these strange artifacts ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is the same with what i usually do, if there are any coding style problem, colcon test (or CI) should say something... i tried this my local environment, all green too. i am not sure what happening here, some rules are changed??? @ros2/team any thoughts?

this->storage_prune_deleted_entities();
this->needs_pruning_ = false;
}

this->storage_release_ownerships();
Comment on lines +219 to 224
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check and the pruning happen after releasing ownerships? It seems like holding the ownership by the storage might cause the weak pointers to have not expired even though all the user's shared pointers have been destroyed? Maybe it gets caught on the next run through?

I suppose this would only happen if something lost all ownership (except the wait set's) after needs_pruning_ became true but before this step.

Either way, since storage_prune_deleted_entities() iterates through all of the wait set's weak pointers and checks them for expired, we might as well release ownerships first no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to happen while holding the ownership, in order to be sure that the indices still match. If you release ownership, and prune, and some references were dropped in between, the indices of the rcl waitset will not match any more the one in the storage. Basically leading to the same bug as before.
I don't like this implicit matching of indixes in general. I would like to introduce a backmapping datastructure later on, that uses the indices returned by the rcl_add functions

}

Expand Down
9 changes: 9 additions & 0 deletions rclcpp/include/rclcpp/wait_set_policies/static_storage.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,15 @@ class StaticStorage : public rclcpp::wait_set_policies::detail::StoragePolicyCom
services_,
waitables_
);

if(this->needs_pruning_) {
// we need to throw here, as the indexing of the rcl_waitset is broken,
// in case of invalid entries

throw std::runtime_error(
"StaticStorage : storage_rebuild_rcl_wait_set: Detected"
" invalid entity in static entity storage");
Comment on lines +169 to +170
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit, don't block the pr on this:

Suggested change
"StaticStorage : storage_rebuild_rcl_wait_set: Detected"
" invalid entity in static entity storage");
"StaticStorage::storage_rebuild_rcl_wait_set(): entity weak_ptr "
"unexpectedly expired in static entity storage");

}
}

// storage_add_subscription() explicitly not declared here
Expand Down
210 changes: 210 additions & 0 deletions rclcpp/test/rclcpp/executors/test_executors.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -890,3 +890,213 @@ TYPED_TEST(TestExecutors, testRaceDropCallbackGroupFromSecondThread)
executor.cancel();
executor_thread.join();
}

TYPED_TEST(TestExecutors, dropSomeTimer)
{
using ExecutorType = TypeParam;
ExecutorType executor;

auto node = std::make_shared<rclcpp::Node>("test_node");

bool timer1_works = false;
bool timer2_works = false;

auto timer1 = node->create_timer(std::chrono::milliseconds(10), [&timer1_works]() {
timer1_works = true;
});
auto timer2 = node->create_timer(std::chrono::milliseconds(10), [&timer2_works]() {
timer2_works = true;
});

executor.add_node(node);

// first let's make sure that both timers work
auto max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!timer1_works || !timer2_works) {
// let the executor pick up the node and the timers
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

// delete timer 2. Note, the executor uses an unordered map internally, to order
// the entities added to the rcl waitset therefore the order is kind of undefined,
// and this test may be flaky. In case it triggers, something is most likely
// really broken.
timer2.reset();

timer1_works = false;
timer2_works = false;
max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!timer1_works && !timer2_works) {
fujitatomoya marked this conversation as resolved.
Show resolved Hide resolved
// let the executor pick up the node and the timers
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

ASSERT_TRUE(timer1_works || timer2_works);
}

TYPED_TEST(TestExecutors, dropSomeNodeWithTimer)
{
using ExecutorType = TypeParam;
ExecutorType executor;

auto node1 = std::make_shared<rclcpp::Node>("test_node_1");
auto node2 = std::make_shared<rclcpp::Node>("test_node_2");

bool timer1_works = false;
bool timer2_works = false;

auto timer1 = node1->create_timer(std::chrono::milliseconds(10), [&timer1_works]() {
timer1_works = true;
});
auto timer2 = node2->create_timer(std::chrono::milliseconds(10), [&timer2_works]() {
timer2_works = true;
});

executor.add_node(node1);
executor.add_node(node2);

// first let's make sure that both timers work
auto max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!timer1_works || !timer2_works) {
// let the executor pick up the node and the timers
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

// delete node 1.
node1 = nullptr;

timer2_works = false;
max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!timer2_works) {
// let the executor pick up the node and the timer
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

ASSERT_TRUE(timer2_works);
}

TYPED_TEST(TestExecutors, dropSomeSubscription)
{
using ExecutorType = TypeParam;
ExecutorType executor;

auto node = std::make_shared<rclcpp::Node>("test_node");

bool sub1_works = false;
bool sub2_works = false;

auto sub1 = node->create_subscription<test_msgs::msg::Empty>("/test_drop", 10,
[&sub1_works](const test_msgs::msg::Empty &) {
sub1_works = true;
});
auto sub2 = node->create_subscription<test_msgs::msg::Empty>("/test_drop", 10,
[&sub2_works](const test_msgs::msg::Empty &) {
sub2_works = true;
});

auto pub = node->create_publisher<test_msgs::msg::Empty>("/test_drop", 10);

executor.add_node(node);

// first let's make sure that both timers work
auto max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!sub1_works || !sub2_works) {
pub->publish(test_msgs::msg::Empty());

// let the executor pick up the node and the timers
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

// delete subscription 2. Note, the executor uses an unordered map internally, to order
// the entities added to the rcl waitset therefore the order is kind of undefined,
// and this test may be flaky. In case it triggers, something is most likely
// really broken.
sub2.reset();

sub1_works = false;
sub2_works = false;
max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!sub1_works && !sub2_works) {
pub->publish(test_msgs::msg::Empty());

// let the executor pick up the node and the timers
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

ASSERT_TRUE(sub1_works || sub2_works);
}

TYPED_TEST(TestExecutors, dropSomeNodesWithSubscription)
{
using ExecutorType = TypeParam;
ExecutorType executor;

auto node = std::make_shared<rclcpp::Node>("test_node");
auto node1 = std::make_shared<rclcpp::Node>("test_node_1");
auto node2 = std::make_shared<rclcpp::Node>("test_node_2");

bool sub1_works = false;
bool sub2_works = false;

auto sub1 = node1->create_subscription<test_msgs::msg::Empty>("/test_drop", 10,
[&sub1_works](const test_msgs::msg::Empty &) {
sub1_works = true;
});
auto sub2 = node2->create_subscription<test_msgs::msg::Empty>("/test_drop", 10,
[&sub2_works](const test_msgs::msg::Empty &) {
sub2_works = true;
});

auto pub = node->create_publisher<test_msgs::msg::Empty>("/test_drop", 10);

executor.add_node(node);
executor.add_node(node1);
executor.add_node(node2);

// first let's make sure that both subscribers work
auto max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!sub1_works || !sub2_works) {
pub->publish(test_msgs::msg::Empty());

// let the executor pick up the node and the timers
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

// delete node 2.
node2 = nullptr;

sub1_works = false;
max_end_time = std::chrono::steady_clock::now() + std::chrono::milliseconds(100);
while(!sub1_works) {
pub->publish(test_msgs::msg::Empty());

// let the executor pick up the node and the timers
executor.spin_all(std::chrono::milliseconds(10));

const auto cur_time = std::chrono::steady_clock::now();
ASSERT_LT(cur_time, max_end_time);
}

ASSERT_TRUE(sub1_works);
}