-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution structure (spin-off of #427) #428
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
pub fn spin_once(&self, timeout: Option<Duration>) -> Result<(), RclrsError> { | ||
fn spin_once(&self, timeout: Option<Duration>) -> Result<(), RclrsError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was spin_once
made private and is there a way to have a call to spin that doesn't run continuously? I also noticed in the implementation that this function executes all the available work so it's more of a spin_some
than spin_once
but that's out of the scope here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was spin_once made private and is there a way to have a call to spin that doesn't run continuously?
The idea is to make this fn spin_once
function an implementation detail. The public API would now be
executor.spin(SpinOptions::spin_once());
It's certainly more verbose, but I'd like to start the API transition by narrowing down the public API for Executor
to be based entirely on SpinOptions
. After we've tick-tocked this version of spin_once
I wouldn't mind revisiting how to make the API more ergonomic. But I don't think we'll need to worry about making "spin once" ergonomic since the vast majority of downstream users won't have any use for it. Once async execution is available they will almost certainly want to use until_promise_resolved
in places where they would have previously used spin_once
in a loop with a condition. But that's a discussion that doesn't make sense to have until we're reviewing the async execution PR.
I also noticed in the implementation that this function executes all the available work so it's more of a spin_some than spin_once but that's out of the scope here
I double-checked and if I'm reading the ROS 1 and rclcpp docs correctly, spin_once from ROS 1 would do all available work without waiting for any new work. In rclcpp this was changed so spin_once will wait until the next time new work becomes available if there isn't any work already available. In rclcpp spin_some behaves the way spin_once used to in ROS 1.
In this case our behavior matches rclcpp's spin_once so I think we should keep the name as-is.
@mxgrey could you revert the version bump? I prefer doing it before a new release separate from other PRs |
@esteve I can certainly do that but I should caution that without the version bump cargo will not recognize that it needs to update its lockfile nor its build cache. I ran into an issue on my machine where compiling the examples was failing because cargo insisted on using an outdated version of rclrs that it had cached. I had to delete my colcon build and install directories and delete the target directories in the cargo packages before it would compile. Bumping the version number across all the |
FWIW another way to avoid the problem that I just described is for us to put a workspace If you'd like I could revert the version bump and add in a |
use anyhow::{Error, Result}; | ||
|
||
#[tokio::main] | ||
async fn main() -> Result<(), Error> { | ||
let context = rclrs::Context::new(env::args())?; | ||
let mut executor = rclrs::Context::default_from_env()?.create_basic_executor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxgrey how would a custom executor work here? It'd take a Context
as a argument, right? I'd remove the create_basic_executor
function and be slightly more verbose (SingleThreadedExecutor::new(&context)
) to show how the API works and basically to not make any distinction between the executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we look several PRs into the future we can see how the basic executor will be created once we have the ability to support custom executors.
The hook for making a custom executor will be the runtime trait which downstream users can implement for their custom struct. Once they've implemented that trait, anyone can use the custom runtime to create an opaque Executor
instance. The Executor
hides away what kind of runtime is being used so the rest of the application doesn't have to care about it.
I've gone ahead and made a repo to demonstrate what this would look like. In that repo I've made two different custom executors: DoNothingSimple
and DoNothingAdvanced
. Both are custom executor runtimes which, as their name implies, do nothing. They're not useful as far as executors go, but they can show us how the custom executor API will work. If you clone that repo into a colcon workspace and checkout the worker branch of my fork you'll be able to compile and run the demo.
Simple
The simplest way to create a custom executor is to use the Context::create_executor
method as demonstrated here. This will work perfectly fine as long as it's okay for downstream users to directly initialize your custom runtime.
Advanced
There are two reasons you might want to use a more advanced approach for a custom runtime:
- You don't want users to be able to directly instantiate your runtime. Maybe your runtime is a singleton or relies on some dependencies that you want to hide as implementation details.
- Your runtime needs access to the
Context
right away during instantiation, and you want to guarantee that the user can't pass the wrong context to you.
To achieve this you can use the extension trait pattern. The repo demonstrates an implementation of this pattern here. Then the pattern gets used here.
This pattern allows you to add new methods to the Context
struct from downstream crates. In this case we're adding Context::create_do_nothing_executor
which bears an uncanny resemblance to our built-in method Context::create_basic_executor
.
All that is to say, the API for custom downstream executors can be fully consistent with this out-of-the-box upstream API of Context::create_basic_executor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the Working Group meeting, I've change this PR to use an extension trait to provide the Context::create_basic_executor
method: a75cd24
At the same time I've simplified all the tests and demos to just do use rclrs::*;
since that's probably how most users will want to use rclrs anyway.
e3a7436
to
0918476
Compare
@esteve After giving it more thought I've reverted the version bump. It would fix the problem for people pulling this PR, but then we'd need to keep bumping the version number with each of the subsequent PRs that are lined up right now or else the problem will keep happening. I think gathering all the crates up into one cargo workspace is the right solution here. How would you feel if I open up a separate PR for that? |
Signed-off-by: Michael X. Grey <[email protected]>
I should bring attention to a last minute change that I just introduced: b465f6f While developing the workers PR #446 I realized that a multi-threaded executor could accumulate multiple To allow this to funnel into a single Since the point of this PR is to introduce all the user-facing API changes that will be needed in order to support async executors (which IMO includes multi-threaded executors), I figured I should push that API change up to this PR. |
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
/// A helper trait to disregard timeouts as not an error. | ||
pub trait RclrsErrorFilter { | ||
/// Get the first error available, or Ok(()) if there are no errors. | ||
fn first_error(self) -> Result<(), RclrsError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can basically be done with into_iter().collect()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's both interesting and surprising to me. I knew that you could use collect to go from Vec<Result<T, E>>
to Result<Vec<T>, E>
but I didn't know Vec<Result<(), E>>
to Result<(), E>
was possible. This seems to be a special case for ()
because I get a compilation error when I try to use it with Vec<Result<i32, E>>
.
I can change the implementation of fn first_error
to take advantage of this, although I think the code would be a bit less clear about what it's doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also remove this function from the public API and let the users figure out what to do with a Vec<Result<(), RclrsError>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're already using it in the unit tests. I'd rather not replace it with .into_iter().collect()
because I feel like .into_iter().collect()
is much more abstruse than it needs to be for expressing something as simple as "give me the first error in this collection".
Not sure, what would be the implications? Eventually we'd like to move the crates to separate repositories so that they can be released independently into the ROS buildfarm, though we're still far from getting there, would putting all the crates into one cargo workspace prevent that in the future? |
The only implication is that it you can do
No, there's no significant commitment to setting up a workspace. IIRC we were mostly planning on moving out |
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
@esteve I believe we've closed the loop on all feedback, so I think it's a good time to announce a final call for feedback to the working group. |
Yes, thanks for making the change. It just makes it less likely that we'll accidentally give special treatment to the basic executor. |
impl Default for Context { | ||
fn default() -> Self { | ||
// SAFETY: It should always be valid to instantiate a context with no | ||
// arguments, no parameters, no options, etc. | ||
Self::new([], InitOptions::default()).expect("Failed to instantiate a default context") | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a doc comment for this impl
to warn people that this isn't the same thing as the "default context" they're used to in rclcpp
, which is a global shared context.
use std::{ | ||
sync::{Arc, Mutex, Weak}, | ||
time::Duration, | ||
}; | ||
|
||
/// Single-threaded executor implementation. | ||
pub struct SingleThreadedExecutor { | ||
pub struct Executor { | ||
context: Arc<ContextHandle>, | ||
nodes_mtx: Mutex<Vec<Weak<Node>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the removal of remove_node()
, it should now be the responsibility of the Executor
to clean up old Node
s. So to avoid leaking the Node
allocations, we should filter those out, possibly within spin
or spin_once
.
/// Sending messages does not require calling [`spin`][1] on the publisher's node. | ||
/// Sending messages does not require the node's executor to [spin][2]. | ||
/// | ||
/// [1]: crate::spin | ||
/// [2]: crate::Executor::spin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick, but this could just be [1]
instead of [2]
.
This PR is spun out of #427 and focuses entirely on changes to the execution structure (Context -> Execution -> Node relationship).
This is the most crucial change for the async execution to be feasible. Up until now, rclrs has had a loose coupling between the executor and nodes. That means nodes could be added to or removed from the executor freely. This comes with the following disadvantages:
Due to the last bullet point in particular, I'm proposing a rigid procedure in which:
The minimum publisher example shows a good demonstration of this new procedure.
In this structure, Nodes are no longer allowed to be removed from an Executor or move between Executors. That's a very unlikely use case to begin with, so I don't think there's a downside to this restriction. After #421 we'll be able to introduce custom flavors of Executors, so if anyone really cares about moving their node around between executors, it would be possible to introduce a custom hierarchical executor that can move nodes between other executors.