Execution structure (spin-off of #427) #428

mxgrey · 2024-11-20T15:08:16Z

This PR is spun out of #427 and focuses entirely on changes to the execution structure (Context -> Execution -> Node relationship).

This is the most crucial change for the async execution to be feasible. Up until now, rclrs has had a loose coupling between the executor and nodes. That means nodes could be added to or removed from the executor freely. This comes with the following disadvantages:

It's possible to add the same node to multiple executors and spin them both at the same time. We have protections in place so that this won't violate any contract with rcl, but it would be a confusing arrangement.
We need many layers of mutexes, arcs, and weaks to manage the loose ownership.
Users don't have a clear order in which to invoke these three top-level objects.
For async execution, nodes will need a way of issuing jobs to their executor at any time, which isn't possible if they can be orphaned or bounce around between executors.

Due to the last bullet point in particular, I'm proposing a rigid procedure in which:

A user creates a Context
The Context can create one or more Executors
Each Executor can create one or more Nodes

The minimum publisher example shows a good demonstration of this new procedure.

In this structure, Nodes are no longer allowed to be removed from an Executor or move between Executors. That's a very unlikely use case to begin with, so I don't think there's a downside to this restriction. After #421 we'll be able to introduce custom flavors of Executors, so if anyone really cares about moving their node around between executors, it would be possible to introduce a custom hierarchical executor that can move nodes between other executors.

Signed-off-by: Michael X. Grey <[email protected]>

rclrs/src/context.rs

rclrs/src/executor.rs

rclrs/src/parameter/service.rs

Signed-off-by: Michael X. Grey <[email protected]>

luca-della-vedova · 2024-12-16T09:09:27Z

rclrs/src/executor.rs

-    pub fn spin_once(&self, timeout: Option<Duration>) -> Result<(), RclrsError> {
+    fn spin_once(&self, timeout: Option<Duration>) -> Result<(), RclrsError> {


Why was spin_once made private and is there a way to have a call to spin that doesn't run continuously? I also noticed in the implementation that this function executes all the available work so it's more of a spin_some than spin_once but that's out of the scope here

Why was spin_once made private and is there a way to have a call to spin that doesn't run continuously?

The idea is to make this fn spin_once function an implementation detail. The public API would now be

executor.spin(SpinOptions::spin_once());

It's certainly more verbose, but I'd like to start the API transition by narrowing down the public API for Executor to be based entirely on SpinOptions. After we've tick-tocked this version of spin_once I wouldn't mind revisiting how to make the API more ergonomic. But I don't think we'll need to worry about making "spin once" ergonomic since the vast majority of downstream users won't have any use for it. Once async execution is available they will almost certainly want to use until_promise_resolved in places where they would have previously used spin_once in a loop with a condition. But that's a discussion that doesn't make sense to have until we're reviewing the async execution PR.

I also noticed in the implementation that this function executes all the available work so it's more of a spin_some than spin_once but that's out of the scope here

I double-checked and if I'm reading the ROS 1 and rclcpp docs correctly, spin_once from ROS 1 would do all available work without waiting for any new work. In rclcpp this was changed so spin_once will wait until the next time new work becomes available if there isn't any work already available. In rclcpp spin_some behaves the way spin_once used to in ROS 1.

In this case our behavior matches rclcpp's spin_once so I think we should keep the name as-is.

esteve · 2024-12-16T10:56:12Z

@mxgrey could you revert the version bump? I prefer doing it before a new release separate from other PRs

mxgrey · 2024-12-16T11:05:56Z

@esteve I can certainly do that but I should caution that without the version bump cargo will not recognize that it needs to update its lockfile nor its build cache. I ran into an issue on my machine where compiling the examples was failing because cargo insisted on using an outdated version of rclrs that it had cached. I had to delete my colcon build and install directories and delete the target directories in the cargo packages before it would compile.

Bumping the version number across all the Cargo.tomls as I did in the last commit should spare others from needing to manually purge their build artifacts. Do you still want me to revert it?

mxgrey · 2024-12-16T11:07:28Z

FWIW another way to avoid the problem that I just described is for us to put a workspace Cargo.toml in the root of the git repo. That's something I've been wanting to do for a long time now, but I've been holding off on it in case it's controversial for any reason. colcon-cargo should be able to support it by now, so that shouldn't be a blocker at least.

If you'd like I could revert the version bump and add in a Cargo.toml workspace. This will mean that all the Cargo.toml in the examples directory will specify relative paths to rclrs and rosidl_runtime_rs, so they might not be as useful for folks who want to just copy the whole example crate directory when starting a project.

esteve · 2024-12-16T11:38:21Z

examples/minimal_client_service/src/minimal_client_async.rs

 use anyhow::{Error, Result};

 #[tokio::main]
 async fn main() -> Result<(), Error> {
-    let context = rclrs::Context::new(env::args())?;
+    let mut executor = rclrs::Context::default_from_env()?.create_basic_executor();


@mxgrey how would a custom executor work here? It'd take a Context as a argument, right? I'd remove the create_basic_executor function and be slightly more verbose (SingleThreadedExecutor::new(&context)) to show how the API works and basically to not make any distinction between the executors.

If we look several PRs into the future we can see how the basic executor will be created once we have the ability to support custom executors.

The hook for making a custom executor will be the runtime trait which downstream users can implement for their custom struct. Once they've implemented that trait, anyone can use the custom runtime to create an opaque Executor instance. The Executor hides away what kind of runtime is being used so the rest of the application doesn't have to care about it.

I've gone ahead and made a repo to demonstrate what this would look like. In that repo I've made two different custom executors: DoNothingSimple and DoNothingAdvanced. Both are custom executor runtimes which, as their name implies, do nothing. They're not useful as far as executors go, but they can show us how the custom executor API will work. If you clone that repo into a colcon workspace and checkout the worker branch of my fork you'll be able to compile and run the demo.

Simple

The simplest way to create a custom executor is to use the Context::create_executor method as demonstrated here. This will work perfectly fine as long as it's okay for downstream users to directly initialize your custom runtime.

Advanced

There are two reasons you might want to use a more advanced approach for a custom runtime:

You don't want users to be able to directly instantiate your runtime. Maybe your runtime is a singleton or relies on some dependencies that you want to hide as implementation details.

Your runtime needs access to the Context right away during instantiation, and you want to guarantee that the user can't pass the wrong context to you.

To achieve this you can use the extension trait pattern. The repo demonstrates an implementation of this pattern here. Then the pattern gets used here.

This pattern allows you to add new methods to the Context struct from downstream crates. In this case we're adding Context::create_do_nothing_executor which bears an uncanny resemblance to our built-in method Context::create_basic_executor.

All that is to say, the API for custom downstream executors can be fully consistent with this out-of-the-box upstream API of Context::create_basic_executor.

As discussed in the Working Group meeting, I've change this PR to use an extension trait to provide the Context::create_basic_executor method: a75cd24

At the same time I've simplified all the tests and demos to just do use rclrs::*; since that's probably how most users will want to use rclrs anyway.

mxgrey · 2024-12-16T14:53:25Z

@esteve After giving it more thought I've reverted the version bump. It would fix the problem for people pulling this PR, but then we'd need to keep bumping the version number with each of the subsequent PRs that are lined up right now or else the problem will keep happening.

I think gathering all the crates up into one cargo workspace is the right solution here. How would you feel if I open up a separate PR for that?

Signed-off-by: Michael X. Grey <[email protected]>

mxgrey · 2024-12-16T15:09:11Z

I should bring attention to a last minute change that I just introduced: b465f6f

While developing the workers PR #446 I realized that a multi-threaded executor could accumulate multiple RclrsErrors across its different threads within one spin. I don't want to discard any of those errors just for the sake of returning a single Result<(), RclrsError> from Executor::spin, so in #446 I changed Executor::spin to return a Vec<RclrsError> which will be empty if no errors occurred.

To allow this to funnel into a single Result<(), RclrsError> instance when users want it to, I introduced the trait method RclrsErrorFilter::first_error which can be applied to Vec<RclrsError> to get a Result<(), RclrsError> which will reflect whether the vector contained any errors. You can see an example usage here.

Since the point of this PR is to introduce all the user-facing API changes that will be needed in order to support async executors (which IMO includes multi-threaded executors), I figured I should push that API change up to this PR.

Signed-off-by: Michael X. Grey <[email protected]>

maspe36 · 2024-12-16T17:45:40Z

rclrs/src/error.rs

+/// A helper trait to disregard timeouts as not an error.
+pub trait RclrsErrorFilter {
+    /// Get the first error available, or Ok(()) if there are no errors.
+    fn first_error(self) -> Result<(), RclrsError>;


This can basically be done with into_iter().collect()

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=136332e5a2170195246e3830097993b9

That's both interesting and surprising to me. I knew that you could use collect to go from Vec<Result<T, E>> to Result<Vec<T>, E> but I didn't know Vec<Result<(), E>> to Result<(), E> was possible. This seems to be a special case for () because I get a compilation error when I try to use it with Vec<Result<i32, E>>.

I can change the implementation of fn first_error to take advantage of this, although I think the code would be a bit less clear about what it's doing.

We could also remove this function from the public API and let the users figure out what to do with a Vec<Result<(), RclrsError>>

We're already using it in the unit tests. I'd rather not replace it with .into_iter().collect() because I feel like .into_iter().collect() is much more abstruse than it needs to be for expressing something as simple as "give me the first error in this collection".

esteve · 2024-12-18T12:08:09Z

@mxgrey

I think gathering all the crates up into one cargo workspace is the right solution here. How would you feel if I open up a separate PR for that?

Not sure, what would be the implications? Eventually we'd like to move the crates to separate repositories so that they can be released independently into the ROS buildfarm, though we're still far from getting there, would putting all the crates into one cargo workspace prevent that in the future?

mxgrey · 2024-12-18T12:44:22Z

Not sure, what would be the implications?

The only implication is that it you can do $ cargo build, $ cargo run, and $ cargo test from the root workspace, and the crates can all share one cache and one lockfile. There are no downsides as far as I can figure.

would putting all the crates into one cargo workspace prevent that in the future?

No, there's no significant commitment to setting up a workspace. IIRC we were mostly planning on moving out rosidl_generator_rs which is a Python package anyway. But even if we wanted to move out rosidl_runtime_rs into its own repo, when it comes time to do that we would just revert the Cargo.toml of rclrs to specify a version for rosidl_runtime_rs (like it does right now) instead of specifying a path. But I assume the example crates will always stay in the same repo as rclrs, so that should continue to be useful.

Signed-off-by: Michael X. Grey <[email protected]>

esteve · 2025-01-15T15:45:38Z

@mxgrey let us know when this is ready for one final review, though my only feedback was about the create_basic_executor function, but that's been addressed. @nwn is that change ok with you?

mxgrey · 2025-01-15T16:00:22Z

@esteve I believe we've closed the loop on all feedback, so I think it's a good time to announce a final call for feedback to the working group.

nwn · 2025-01-20T14:26:28Z

@mxgrey let us know when this is ready for one final review, though my only feedback was about the create_basic_executor function, but that's been addressed. @nwn is that change ok with you?

Yes, thanks for making the change. It just makes it less likely that we'll accidentally give special treatment to the basic executor.

rclrs/src/context.rs

rclrs/src/executor.rs

rclrs/src/publisher.rs

Signed-off-by: Michael X. Grey <[email protected]>

esteve

@mxgrey thanks for addressing my feedback. Changes look good to me, awesome work! If there's anything down the line that needs to be tweak, we can just address it in a followup PR. Thanks! 👍

mxgrey and others added 4 commits November 16, 2024 16:05

Migrate Context, Executor, and Node creation to new API

29fe10e

Signed-off-by: Michael X. Grey <[email protected]>

Update examples

1b5c187

Signed-off-by: Michael X. Grey <[email protected]>

Fix documentation

1ec9f10

Signed-off-by: Michael X. Grey <[email protected]>

Fix formatting

0874d8d

Signed-off-by: Michael X. Grey <[email protected]>

mxgrey mentioned this pull request Nov 20, 2024

API Changes: execution structure, options pattern, and arcs #427

Closed

Fix example formatting

4c2a67b

Signed-off-by: Michael X. Grey <[email protected]>

luca-della-vedova reviewed Nov 21, 2024

View reviewed changes

rclrs/src/context.rs Show resolved Hide resolved

rclrs/src/executor.rs Show resolved Hide resolved

rclrs/src/parameter/service.rs Outdated Show resolved Hide resolved

rclrs/src/parameter/service.rs Show resolved Hide resolved

mxgrey added 4 commits November 23, 2024 17:03

Implicitly cast &str to NodeOptions

6059506

Signed-off-by: Michael X. Grey <[email protected]>

Remove debug outputs

259fcb3

Signed-off-by: Michael X. Grey <[email protected]>

Fix formatting

e1ceb70

Signed-off-by: Michael X. Grey <[email protected]>

Merge with latest main

0918476

Signed-off-by: Michael X. Grey <[email protected]>

mxgrey mentioned this pull request Dec 9, 2024

Options pattern (spin-off of #427) #429

Merged

luca-della-vedova approved these changes Dec 16, 2024

View reviewed changes

esteve requested changes Dec 16, 2024

View reviewed changes

mxgrey force-pushed the execution_structure branch from e3a7436 to 0918476 Compare December 16, 2024 14:51

Return a Vec of errors from spinning

b465f6f

Signed-off-by: Michael X. Grey <[email protected]>

mxgrey added 2 commits December 16, 2024 23:16

update examples

9a04993

Signed-off-by: Michael X. Grey <[email protected]>

Fix example formatting

4de816a

Signed-off-by: Michael X. Grey <[email protected]>

maspe36 mentioned this pull request Dec 17, 2024

Async Workers #446

Merged

maspe36 previously approved these changes Dec 17, 2024

View reviewed changes

mxgrey mentioned this pull request Dec 18, 2024

Use 1.75-compatible lockfiles when running Minimal CI #450

Closed

Introduce CreateBasicExecutor trait

a75cd24

Signed-off-by: Michael X. Grey <[email protected]>

mxgrey dismissed maspe36’s stale review via a75cd24 January 14, 2025 04:52

mxgrey added 2 commits January 14, 2025 13:01

Fix formatting

71c69a8

Signed-off-by: Michael X. Grey <[email protected]>

Merge remote-tracking branch 'origin/main' into execution_structure

99bdf8a

nwn reviewed Jan 20, 2025

View reviewed changes

rclrs/src/context.rs Outdated Show resolved Hide resolved

rclrs/src/executor.rs Show resolved Hide resolved

rclrs/src/publisher.rs Outdated Show resolved Hide resolved

mxgrey added 4 commits February 4, 2025 15:24

Add documentation for Context::default

b14e56e

Signed-off-by: Michael X. Grey <[email protected]>

Automatically clear dead nodes from the executor

d047214

Signed-off-by: Michael X. Grey <[email protected]>

Update reference index

e7f8bac

Signed-off-by: Michael X. Grey <[email protected]>

Fix style

5c37fa8

Signed-off-by: Michael X. Grey <[email protected]>

esteve approved these changes Feb 13, 2025

View reviewed changes

esteve merged commit c463201 into ros2-rust:main Feb 25, 2025
6 checks passed

mxgrey mentioned this pull request Mar 4, 2025

Possible error in pubsub example: Subscriber class looses ref to node #462

Closed

		pub fn spin_once(&self, timeout: Option<Duration>) -> Result<(), RclrsError> {
		fn spin_once(&self, timeout: Option<Duration>) -> Result<(), RclrsError> {

Execution structure (spin-off of #427) #428

Execution structure (spin-off of #427) #428

Uh oh!

Conversation

mxgrey commented Nov 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luca-della-vedova Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

mxgrey Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

esteve commented Dec 16, 2024

Uh oh!

mxgrey commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxgrey commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esteve Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

mxgrey Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Simple

Advanced

Uh oh!

mxgrey Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

mxgrey commented Dec 16, 2024

Uh oh!

mxgrey commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maspe36 Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

mxgrey Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maspe36 Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

mxgrey Dec 20, 2024

Choose a reason for hiding this comment

Uh oh!

esteve commented Dec 18, 2024

Uh oh!

mxgrey commented Dec 18, 2024

Uh oh!

esteve commented Jan 15, 2025

Uh oh!

mxgrey commented Jan 15, 2025

Uh oh!

nwn commented Jan 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

esteve left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mxgrey commented Dec 16, 2024 •

edited

Loading

mxgrey commented Dec 16, 2024 •

edited

Loading

mxgrey Dec 16, 2024 •

edited

Loading

mxgrey commented Dec 16, 2024 •

edited

Loading

mxgrey Dec 18, 2024 •

edited

Loading