Introduce EventsExecutor implementation #1391

bmartin427 · 2024-12-31T22:12:15Z

bmartin427 · 2024-12-31T22:13:50Z

Draft form of this feature to drive discussion only. I've done some minimal work on this relative to the original code to get things in roughly the right shape; however the functional test currently seems to explode, and a bunch of the linters are unhappy. Even once this nominally works, there's a fair bit more work to be done on it, starting with the removal of the boost dependency.

Signed-off-by: Brad Martin <[email protected]>

The previous version worked on 22.04+Iron, but fails on 24.04+Jazzy or Rolling. It seems like the behavior of std::bind() may have changed when asked to bind a py::object to a callback taking a py::handle. Signed-off-by: Brad Martin <[email protected]>

Signed-off-by: Brad Martin <[email protected]>

clang-format fixes things that uncrustify tends to leave alone, but then uncrustify seems slightly unhappy with that result. Also reflow comments at 100 columns. This uses the .clang-format file from the ament-clang-format package, with the exception of SortIncludes being set to false, because I didn't like what it tried to do with includes without that override. Signed-off-by: Brad Martin <[email protected]>

Signed-off-by: Brad Martin <[email protected]>

Tests that currently work are enabled, and those that don't are annotated what needs to be done before they can be enabled Signed-off-by: Brad Martin <[email protected]>

Signed-off-by: Brad Martin <[email protected]>

rclpy/src/rclpy/events_executor/timers_manager.cpp

jmachowinski · 2025-01-14T09:50:39Z

rclpy/src/rclpy/events_executor/timers_manager.cpp

+      if (next_call_ns <= 0) {
+        // This just notifies RCL that we're considering the timer triggered, for the purposes of
+        // updating the next trigger time.
+        const auto ret = rcl_timer_call(timer_cb_pair.first);


This happens directly before the the execution of the timer in the c++ version. I wonder if doing it here introduces some unwanted behavior...

None that I've encountered yet? It just seemed important to get started on the next interval as quickly as possible to keep the timer rate from drifting due to the latency in dispatching the timer. Are there any specific issues you suspect with this?

I remember some test failing around this detail in my implementations, sadly I don't remember the details. I guess the behavior around cancel is broken now.
E.g.
A callback taking some time is running.
Timer becomes ready, is added to the ready queue
Some other callback cancels the timer
Timer is executed anyway

Is there a check if the timer was canceled directly before the execution, that I missed ?

The interval should not drift, if this call is issued later. The reason for this is, that the next trigger time is computed as :

while(nextTime < now) { nextTimer = nextTime + period; }

I think there is another bug lurking here. If a timer takes longer than its period, your queue will fill up. You need to block the timer from recomputing the next execution time, until it was executed.

Ah gotcha, yeah it won't drift but it will potentially miss whole intervals, but perhaps that was intended per your concern about queuing multiple calls.

As far as canceling, I'm not sure even SingleThreadedExecutor is safe from races on that front:

I think a cancel between the is_timer_ready() check in _wait_for_ready_callbacks() and the call_timer_with_info() call in _take_timer() will cause an uncaught RCLError exception to be thrown:

File "/home/brad/rclpy_ws/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 380, in _take_timer info = tmr.handle.call_timer_with_info() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ rclpy._rclpy_pybind11.RCLError: failed to call timer: timer is canceled, at ./src/rcl/timer.c:268

A cancel any time after call_timer_with_info() will not prevent the callback already in motion from eventually being delivered

I agree though that spacing out the rcl_timer_call() from the callback dispatch does make this race more likely to hit though. I'll go ahead and rearrange that per the other concerns above.

I think this should be resolved now. Both the rcl_timer_call() and the user callback now occur from the same asio::post() event.

jmachowinski · 2025-01-14T10:00:16Z

rclpy/src/rclpy/events_executor/timers_manager.cpp

+  }
+  it->second->RemoveTimer(timer);
+  if (it->second->empty()) {
+    clock_managers_.erase(it);


hm, isn't there a race here, that the asio callback still happens into the now deleted clock_manager ?

I think the timer's own callback is safe, because it calls a user callback directly without capturing the clock manager itself. However, it may be possible for HandleJump() and UpdateTimers() to get called with a deleted this pointer, if you remove the last timer from a clock immediately after a callback was posted, and before the callback is dispatched. I'll make a note to look at this closer later.

I think this is resolved now.

rclpy/src/rclpy/events_executor/timers_manager.cpp

jmachowinski · 2025-01-16T10:21:10Z

rclpy/src/rclpy/events_executor/events_executor.cpp

+  // objects you're working with.  We'll try to figure out what kind of stuff is hiding behind the
+  // abstraction by having the Waitable add itself to a wait set, then take stock of what all ended
+  // up there.  We'll also have to hope that no Waitable implementations ever change their component
+  // entities over their lifetimes.


Hm, the rclcpp layer explicitly forces the waitables, to implement events handling. I don't know much about the rclpy waitable internals, but the approach taken here, would break the rclcpp waitables, so we should check this... A good candidate to have a look at should be the action interfaces.

There are test cases in test_events_executor.py specifically for actions, so I know that rclpy Waitable type works, unless you think there's something specific missing from that test coverage.

Can you clarify what it is you're looking for here? Are you wanting me to modify the Waitable interface to better support callbacks directly? That's probably a bit more ambitious and invasive than I was hoping to be, but I can look into it if you think it's necessary.

jmachowinski · 2025-01-16T10:34:14Z

rclpy/src/rclpy/events_executor/timers_manager.cpp

+      if (next_call_ns <= 0) {
+        // This just notifies RCL that we're considering the timer triggered, for the purposes of
+        // updating the next trigger time.
+        const auto ret = rcl_timer_call(timer_cb_pair.first);


I remember some test failing around this detail in my implementations, sadly I don't remember the details. I guess the behavior around cancel is broken now.
E.g.
A callback taking some time is running.
Timer becomes ready, is added to the ready queue
Some other callback cancels the timer
Timer is executed anyway

Is there a check if the timer was canceled directly before the execution, that I missed ?

The interval should not drift, if this call is issued later. The reason for this is, that the next trigger time is computed as :

while(nextTime < now) { nextTimer = nextTime + period; }

I think there is another bug lurking here. If a timer takes longer than its period, your queue will fill up. You need to block the timer from recomputing the next execution time, until it was executed.

It was being used only inconsistently, and on reflection it wasn't adding any protection beyond what the GIL already provides. Signed-off-by: Brad Martin <[email protected]>

Needed to release the GIL while blocking on another lock. Signed-off-by: Brad Martin <[email protected]>

Signed-off-by: Brad Martin <[email protected]>

Never bool-test a py::object::attr() return value without an explicit py::cast<bool>. Signed-off-by: Brad Martin <[email protected]>

Signed-off-by: Brad Martin <[email protected]>

EventsExecutor was exploding because the test was leaving destroyed entities in the node by using an incorrect API to destroy them. Signed-off-by: Brad Martin <[email protected]>

…om it * Test methods need to be prefixed with 'test_' in order to function. This had been entirely dead code, and when I enabled it EventsExecutor deadlocked. * On reflection, it seems like a deadlock is exactly what should be expected from what this test is doing. Remove EventsExecutor from the test and document the problem. Signed-off-by: Brad Martin <[email protected]>

Co-authored-by: Janosch Machowinski <[email protected]> Signed-off-by: Brad Martin <[email protected]>

This both prevents an issue where user callbacks could potentially queue up if they didn't dispatch fast enough, and ensures that a timer that has passed its expiration without being dispatched yet can still be canceled without the user callback being delivered later. This commit also makes use of the new rcl API for querying the absolute timer expiration time, instead of the relative number of nanoseconds remaining until it expires. This should both make things more accurate, and potentially reduce overhead as we don't have to re-query the current time for each outstanding timer. Signed-off-by: Brad Martin <[email protected]>

…ckManager Signed-off-by: Brad Martin <[email protected]>

Signed-off-by: Brad Martin <[email protected]>

This both reduces duplicate code now, and simplifies the asio interface used which would need replacing. Signed-off-by: Brad Martin <[email protected]>

This method can't be allowed to leave its Future done callback outstanding in case the method is returning for a reason other than the Future being done. Signed-off-by: Brad Martin <[email protected]>

bmartin427 mentioned this pull request Dec 31, 2024

Feature Addition: Port EventsExecutor concept to Python #1389

Open

Brad Martin added 2 commits January 4, 2025 17:18

Introduce EventsExecutor implementation (ros2#1389)

45f7f5f

Signed-off-by: Brad Martin <[email protected]>

bmartin427 force-pushed the events_executor branch from 039907a to 18036c4 Compare January 4, 2025 22:27

bmartin427 changed the title ~~Introduce EventsExecutor implementation (#1389)~~ Introduce EventsExecutor implementation Jan 7, 2025

Brad Martin added 2 commits January 7, 2025 17:01

Fix rclpy triggering its own deprecation warnings

ebb1ee3

Signed-off-by: Brad Martin <[email protected]>

Fix ament linter complaints

68ecfb4

Signed-off-by: Brad Martin <[email protected]>

bmartin427 force-pushed the events_executor branch from 6a7c1e7 to 68ecfb4 Compare January 7, 2025 22:01

Brad Martin added 9 commits January 7, 2025 18:09

Switch to non-boost asio library

9722f08

Signed-off-by: Brad Martin <[email protected]>

Use internal _rclpy C++ types instead of jumping through Python hoops

1570aac

Signed-off-by: Brad Martin <[email protected]>

Respect init signal handling options

3631594

Signed-off-by: Brad Martin <[email protected]>

Reconcile signal handling differences with SingleThreadedExecutor

f68f995

Signed-off-by: Brad Martin <[email protected]>

test_executor.py: Add coverage for EventsExecutor

447df5e

Tests that currently work are enabled, and those that don't are annotated what needs to be done before they can be enabled Signed-off-by: Brad Martin <[email protected]>

Make spin_once() only return after a user-visible callback

4a26313

Signed-off-by: Brad Martin <[email protected]>

Add get_nodes() method

9e95b53

Signed-off-by: Brad Martin <[email protected]>

Add context management support

99f6f93

Signed-off-by: Brad Martin <[email protected]>

jmachowinski reviewed Jan 14, 2025

View reviewed changes

rclpy/src/rclpy/events_executor/timers_manager.cpp Outdated Show resolved Hide resolved

jmachowinski reviewed Jan 14, 2025

View reviewed changes

jmachowinski reviewed Jan 16, 2025

View reviewed changes

Brad Martin added 5 commits January 16, 2025 21:21

Remove mutex protection on nodes_ member access

dde4442

It was being used only inconsistently, and on reflection it wasn't adding any protection beyond what the GIL already provides. Signed-off-by: Brad Martin <[email protected]>

Fix deadlock during shutdown()

aabfc18

Needed to release the GIL while blocking on another lock. Signed-off-by: Brad Martin <[email protected]>

A little further on using C++ _rclpy types instead of Python interface

4a181d8

Signed-off-by: Brad Martin <[email protected]>

Fix issue with iterating through incomplete Tasks

7cc7f5f

Never bool-test a py::object::attr() return value without an explicit py::cast<bool>. Signed-off-by: Brad Martin <[email protected]>

Add support for coroutines to timer handling

3502872

Signed-off-by: Brad Martin <[email protected]>

bmartin427 force-pushed the events_executor branch from 29c5109 to 3502872 Compare January 21, 2025 01:18

Brad Martin added 2 commits January 21, 2025 02:51

Fix test_not_lose_callback() test to destroy entities properly

4dd05c6

EventsExecutor was exploding because the test was leaving destroyed entities in the node by using an incorrect API to destroy them. Signed-off-by: Brad Martin <[email protected]>

bmartin427 force-pushed the events_executor branch from 0373a8c to 776a1a3 Compare January 22, 2025 17:05

bmartin427 and others added 6 commits January 22, 2025 16:46

Warn on every timer added over the threshold, not just the first

8bfbfbd

Co-authored-by: Janosch Machowinski <[email protected]> Signed-off-by: Brad Martin <[email protected]>

Protect against deferred method calls happening against a deleted Clo…

ee14711

…ckManager Signed-off-by: Brad Martin <[email protected]>

Add support for new TimerInfo data passed to timer handlers

35f4aea

Signed-off-by: Brad Martin <[email protected]>

Simplify spin_once() implementation

df3dbb7

This both reduces duplicate code now, and simplifies the asio interface used which would need replacing. Signed-off-by: Brad Martin <[email protected]>

Fix stale Future done callbacks with spin_until_future_complete()

648b153

This method can't be allowed to leave its Future done callback outstanding in case the method is returning for a reason other than the Future being done. Signed-off-by: Brad Martin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce EventsExecutor implementation #1391

Introduce EventsExecutor implementation #1391

bmartin427 commented Dec 31, 2024 •

edited

Loading

bmartin427 commented Dec 31, 2024 •

edited

Loading

jmachowinski Jan 14, 2025

bmartin427 Jan 15, 2025

jmachowinski Jan 16, 2025

bmartin427 Jan 22, 2025 •

edited

Loading

bmartin427 Jan 24, 2025

jmachowinski Jan 14, 2025

bmartin427 Jan 15, 2025

bmartin427 Jan 24, 2025

jmachowinski Jan 16, 2025

bmartin427 Jan 24, 2025

jmachowinski Jan 16, 2025

Introduce EventsExecutor implementation #1391

Are you sure you want to change the base?

Introduce EventsExecutor implementation #1391

Conversation

bmartin427 commented Dec 31, 2024 • edited Loading

bmartin427 commented Dec 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmartin427 Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmartin427 commented Dec 31, 2024 •

edited

Loading

bmartin427 commented Dec 31, 2024 •

edited

Loading

bmartin427 Jan 22, 2025 •

edited

Loading