Calling the Tokio runtime via CGO/FFI from multiple goroutines #5840

ibeckermayer · 2023-06-30T01:36:27Z

ibeckermayer
Jun 30, 2023

Background

I have an unusual use case which I've struggled to draw concrete conclusions about, and whose answer I think will help elucidate Tokio internals more broadly. I'm working on an application where I'm calling Rust from Go via the FFI. We're integrating a library which makes async calls (https://github.com/Devolutions/IronRDP) and I'm attempting to get it to run via a Tokio runtime.

For the purposes of this discussion I will create a simplified example that illustrates what I'm working with and what different approaches I've tried. In Rust I have a struct like the following:

pub struct Client {
    tokio_rt: tokio::runtime::Runtime,
    iron_rdp_client: IronRDPClient, // calls async library
}

impl Client {
    fn new(tcp: tokio::net::TcpStream, tokio_rt: tokio::runtime::Runtime) -> Self {
        Self {
            tokio_rt,
            iron_rdp_client: IronRDPClient::new(tcp),
        }
    }
}

At a high level, the system works as follows

Initialize the Client by calling init_client, which creates a Tokio runtime (default settings via Runtime::new()), connects a tokio::net::TcpStream, and passes those into Client::new(tcp, tokio_rt). That Client is then passed as a pointer back over the FFI for later use.
From one goroutine, call loop_and_read(client: *mut Client) which loops to read from the client.iron_rdp_client.tcp stream asynchronously, occasionally calling back into Go.
From another goroutine, occasionally call write(client: *mut Client, data: *mut u8,) which asynchronously writes data to the client.iron_rdp_client.tcp

With this high level structure I've tried several different approaches, with varying results, and I'm seeking guidance as to which is ideal:

Try 1: a new `Runtime` for each step

The first thing I tried slightly breaks the three step description above, in that I created a new Runtime and called block_on for each of 1, 2, and 3. This led to some inscrutable mio error, which I eventually decided probably was related to trying to use a single tokio::net::TcpStream on multiple Runtimes, which led me to the attempts described below.

Try 2: `block_on` in each step

Motivation

Once I decided to try everything on a single `Runtime`, my first "apparently working" attempt was to call `block_on` in each of the above steps. That looked something like

1. `init_client` with `block_on`

(Called before either 2 or 3 below)

Code

#[no_mangle]
pub unsafe extern "C" fn init_client() -> *mut Client {
    let tokio_rt = tokio::runtime::Runtime::new().unwrap();
    tokio_rt.block_on(async {
        let tcp = tokio::net::TcpStream::connect().await;
        // Do some other async/await stuff
        let client = Client::new(tcp, tokio_rt);
        client
    })
}

2. `loop_and_read` with `block_on`

(Called concurrently via goroutine with 3 below)

Code

#[no_mangle]
pub unsafe extern "C" fn loop_and_read(client: *mut Client) {
    client.tokio_rt.handle().clone().block_on(async {
        loop {
            let bytes = client.iron_rdp_client.tcp.read().await;
            call_back_into_go(bytes);
        }
    });
}

3. `write` with `block_on`

(Called occasionally from Go via goroutine, concurrent with 2 above)

Code

#[no_mangle]
pub unsafe extern "C" fn write(client: *mut Client, data: *mut u8) {
    client.tokio_rt.handle().clone().block_on(async {
        client.iron_rdp_client.tcp.write(data).await;
    });
}

Discussion

This approach seemed to be working on my machine, however after further research I began doubting whether it would work consistently across platforms. My understanding is that `block_on` blocks on whatever OS thread it's called from. This is fine for (1), which is never called concurrent with another `block_on`, however (2) and (3) _are_ called concurrently (via two different goroutines). While goroutines can and often are called on different threads, there's no semantic guarantee that they are. If the Go scheduler were to happen to schedule a call to (3) on the thread that (2) was already blocking on, then that would mean (3) never gets a chance to execute.

However now that I've written that out, I don't think I'm making sense. When a function is called from CGO, it gets "locked" to a particular thread, which is taken out of the scheduler pool (source). Ergo there's no way for that to happen, and I'm therefore just calling handle().clone().block_on() from different threads, which as far as I can tell, is defined behavior (see the second example here).

All that said, in my exploration I still discovered something interesting which I'm seeking an answer for, so I will continue below.

Try 3: `block_on` in (1), `spawn` in (2) and (3)

Motivation

I noticed that the [`block_on` documentation](https://docs.rs/tokio/latest/tokio/runtime/struct.Runtime.html#method.block_on) states

Note that the future required by this function does not run as a worker. The expectation is that other tasks are spawned by the future here. Awaiting on other futures from the future provided here will not perform as fast as those spawned as workers.

So in order to get better performance I decided to switch (2) and (3) to use spawn

1. `init_client` with `block_on`

See "Try 2", heading "1. init_client with block_on" above

2. `loop_and_read` with `spawn`

(Called concurrently via goroutine with 3 below)

Code

#[no_mangle]
pub unsafe extern "C" fn loop_and_read(client: *mut Client) {
    // Create a channel for sending the result from the async task
    let (tx, rx) = mpsc::channel();

    client.tokio_rt.handle().clone().spawn(async {
        loop {
            let bytes = client.iron_rdp_client.tcp.read().await;
            let err = call_back_into_go(bytes);
            if err != SUCCESS {
                tx.send(err).unwrap();
                break;
            }
        }
    });

    // Wait for the result from the async task
    match rx.recv() {
        Ok(_) => (),
        Err(e) => (),
    }
}

3. `write` with `spawn`

(Called occasionally from Go via goroutine, concurrent with 2 above)

Code

#[no_mangle]
pub unsafe extern "C" fn write(client: *mut Client, data: *mut u8) {
    // Create a channel for sending the result from the async task
    let (tx, rx) = mpsc::channel();

    client.tokio_rt.handle().clone().spawn(async {
        let written = client.iron_rdp_client.tcp.write(data).await;
        tx.send(written).unwrap();
    });

    // Wait for the result from the async task
    match rx.recv() {
      Ok(_) => (),
      Err(e) => (),
  }
}

Discussion

This approach also appears to work for me, and is in a sense more in line with standard tokio usage examples. For example, the tutorial gives the example

#[tokio::main]
async fn main() {
    let handle = tokio::spawn(async {
        // Do some async work
        "return value"
    });

    // Do some other work

    let out = handle.await.unwrap();
    println!("GOT {}", out);
}

Given that #[tokio::main] is syntactic-sugar for creating a default Runtime and calling block_on, my approach here in Try 3 has the same sequence of events of block_on --> spawn. The primary difference is that in the examples, spawn is always called downstream of a higher level call to block_on, whereas in my case, block_on is called, then completes, and only then, later, is spawn called.

Try 4: `spawn` in each step

Motivation

This success got me thinking -- if `spawn` is more performant (and I already have this little channel trick worked out), why not just use `spawn` for all of these steps?

1. `init_client` with `spawn`

(Called before either 2 or 3 below)

Code

#[no_mangle]
pub unsafe extern "C" fn init_client() -> *mut Client {
    // Create a channel for sending the result from the async task
    let (tx, rx) = mpsc::channel();

    let tokio_rt = tokio::runtime::Runtime::new().unwrap();
    tokio_rt.spawn(async {
        let tcp = tokio::net::TcpStream::connect().await;
        // Do some other async/await stuff
        let client = Client::new(tcp, tokio_rt);
        tx.send(client).unwrap();
    })

    // Wait for the result from the async task
    match rx.recv() {
      Ok(client) => client,
      Err(e) => panic!("Error: {}", e),
  }
}

2. `loop_and_read` with `spawn`

See "Try 3", heading "2. loop_and_read with spawn" above

3. `write` with `spawn`

See "Try 3", heading "3. write with spawn" above

Discussion

In this case, the system fails. By adding some logging, I found that I would essentially get "stuck" in init_client, somewhere in the // Do some other async/await stuff. I really don't know what's going on -- why would spawn allow it to execute some distance, including across a few await boundaries, before getting stuck?

Discussion and Questions

So it appears to me that the "Try 2" approach should work, and isn't doing anything particularly undefined. However it's relatively inefficient compared to what's possible using spawn. "Try 3" uses spawn and appears to work, but it's done in an unconventional way for which I couldn't find any documentation on, and so I'm concerned that it may cause issues down the road. And "Try 4" may offer a hint as to whether "Try 3" makes ultimate sense.

The main questions that pop out at me from these results are -- is calling block_on doing something special? For example, is it initializing some aspect of the Runtime's internals that allows later spawns to execute successfully? And if so, do those later spawns somehow rely on being further down the callstack of the block_on (like in the examples)? Or is block_on having been called previously and then returned from enough to do the trick (like in my "Try 3")?

ibeckermayer · 2023-06-30T02:10:07Z

ibeckermayer
Jun 30, 2023
Author

After posting all this, I did just notice that there's a Runtime.enter method which is apparently required for constructing certain types of objects including TcpStream, and which "will also allow you to call methods such as tokio::spawn".

It appears that block_on calls enter internally, so I thought maybe that's what explains the discrepancy between "Try 3" and "Try 4". However when I pre-empt the call to spawn with a call to enter in "Try 4", it still fails in the same way.

0 replies

Darksonn · 2023-07-03T09:04:51Z

Darksonn
Jul 3, 2023
Maintainer

One thing to be careful about with FFI is that you can end up with several copies of Tokio if you link in several different Rust projects using Tokio. They will have different globals for the Tokio state, so they wont work together.

1 reply

ibeckermayer Jul 3, 2023
Author

@Darksonn thanks for the heads up. I'm only linking in a single project, which passes a single pointer to Rust object holding the tokio Runtime back and forth across the FFI boundary, so I don't think that's a problem in this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling the Tokio runtime via CGO/FFI from multiple goroutines #5840

{{title}}

{{editor}}'s edit

{{editor}}'s edit

1. `init_client` with `block_on`

2. `loop_and_read` with `block_on`

3. `write` with `block_on`

1. `init_client` with `block_on`

2. `loop_and_read` with `spawn`

3. `write` with `spawn`

1. `init_client` with `spawn`

2. `loop_and_read` with `spawn`

3. `write` with `spawn`

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Calling the Tokio runtime via CGO/FFI from multiple goroutines #5840

ibeckermayer Jun 30, 2023

Background

Try 1: a new Runtime for each step

Try 2: block_on in each step

1. init_client with block_on

2. loop_and_read with block_on

3. write with block_on

Try 3: block_on in (1), spawn in (2) and (3)

1. init_client with block_on

2. loop_and_read with spawn

3. write with spawn

Try 4: spawn in each step

1. init_client with spawn

2. loop_and_read with spawn

3. write with spawn

Discussion and Questions

Replies: 2 comments · 1 reply

ibeckermayer Jun 30, 2023 Author

Darksonn Jul 3, 2023 Maintainer

ibeckermayer Jul 3, 2023 Author

ibeckermayer
Jun 30, 2023

Try 1: a new `Runtime` for each step

Try 2: `block_on` in each step

1. `init_client` with `block_on`

2. `loop_and_read` with `block_on`

3. `write` with `block_on`

Try 3: `block_on` in (1), `spawn` in (2) and (3)

1. `init_client` with `block_on`

2. `loop_and_read` with `spawn`

3. `write` with `spawn`

Try 4: `spawn` in each step

1. `init_client` with `spawn`

2. `loop_and_read` with `spawn`

3. `write` with `spawn`

Replies: 2 comments 1 reply

ibeckermayer
Jun 30, 2023
Author

Darksonn
Jul 3, 2023
Maintainer

ibeckermayer Jul 3, 2023
Author