Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic with follows from span in multithreaded runtime. #183

Open
mTsBucy1 opened this issue Nov 7, 2024 · 0 comments
Open

Panic with follows from span in multithreaded runtime. #183

mTsBucy1 opened this issue Nov 7, 2024 · 0 comments

Comments

@mTsBucy1
Copy link
Contributor

mTsBucy1 commented Nov 7, 2024

Bug Report

Version

tracing-opentelemetry: 0.25.0 / 0.27.0 (elaborated below),
tracing: 0.1.40

Platform

Linux mtstestclient03 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Description

I have this scenario in my code:

  1. I spawn (multiple) task, which loops and call async instrumented function.
  2. this function gets its tracing::Span::current().id() and sends it through a tokio mpsc channel.
  3. Another task reads this, spawns a instrumented async function, which follows from the received id.

Suddenly my code paniced on startup with

thread 'tokio-runtime-worker' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-opentelemetry-0.25.0/src/layer.rs:953:18:
Missing otel data span extensions
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-subscriber-0.3.18/src/registry/sharded.rs:437:58:
Mutex poisoned: PoisonError { .. }
[stack trace...]
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.

This happened here in the code https://github.com/tokio-rs/tracing-opentelemetry/blob/v0.25.0/src/layer.rs#L953.
My hourly job crashed 4 times this way in the past week. I.e. approximately every 40th time.

I wrote a test, now on the current branch (0.27.0), to try to replicate this behaviour, but it also didn't show the same panic. However, I noticed another odd behaviour: The asserted count fails for a tokio new_multi_thread runtime, but succeeds with the tokio new_current_thread one. I am not sure whether these are indeed related, but it seems to be a bug in its own right.

// in tests/follows_from.rs

#[test]
fn trace_follows_from_diff_thread() {
    use tokio::sync::mpsc;

    #[tracing::instrument]
    async fn span_follows_this(chan: &mpsc::Sender<Option<tracing::Id>>) {
        chan.send(tracing::Span::current().id())
            .await
            .expect("Channel should be open");
    }

    #[tracing::instrument(follows_from = [cause])]
    async fn span(cause: Option<tracing::Id>) {}

    async fn receiver(mut chan: mpsc::Receiver<Option<tracing::Id>>) {
        while let Some(cause) = chan.recv().await {
            span(cause).await;
        }
    }

    let (_tracer, provider, exporter, subscriber) = test_tracer();

    tracing::subscriber::with_default(subscriber, || {
        tokio::runtime::Builder::new_multi_thread()
            .enable_all()
            .build()
            .unwrap()
            .block_on(async {
                let (tx, rx) = mpsc::channel(4);
                tokio::spawn(receiver(rx));
                span_follows_this(&tx).await;
                tokio::time::sleep(std::time::Duration::from_millis(10)).await;
                drop(tx);
            })
    });

    drop(provider); // flush all spans
    let spans = exporter.0.lock().unwrap();
    // The follower and followee
    assert_eq!(spans.len(), 2);
}
@mTsBucy1 mTsBucy1 changed the title Follows from span not tracked in multithreaded runtime. Panic with follows from span in multithreaded runtime. Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant