Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SuspendApp] JVM SupendApp never terminates #3583

Open
serras opened this issue Feb 11, 2025 · 1 comment
Open

[SuspendApp] JVM SupendApp never terminates #3583

serras opened this issue Feb 11, 2025 · 1 comment

Comments

@serras
Copy link
Member

serras commented Feb 11, 2025

Ported from arrow-kt/suspendapp#140, issue by @weronkagolonka

Hello!

After upgrading to 0.5.0, we've noticed a strange behaviour where our Kafka consumer, launched in a separate coroutine, was abruptly stopped few seconds after startup. What's worth noticing, the application itself was not terminated or crashed. The kafka coroutine is running within the same scope of the suspended app and it is polling for messages as long as the jobs is active. Initially, it seemed like simply wrapping it in a new scope would solve the issue - and it did, as the consumer was successfully polling and handling the messages. However, after a deeper investigation to find the root cause of the issue we noticed that the application would never stop, regardless of what was put in the SuspendApp lambda. Given this extremely simple example:

fun main() = SuspendApp { logger.info { "The app is running!" } }

We discovered a significant change in the top-level coroutine launched in the SuspendApp, where after executing the lambda block, the process is being exited with status code 0:

@OptIn(ExperimentalStdlibApi::class)
fun SuspendApp(
  context: CoroutineContext = Dispatchers.Default,
  uncaught: (Throwable) -> Unit = Throwable::printStackTrace,
  timeout: Duration = Duration.INFINITE,
  process: Process = process(),
  block: suspend CoroutineScope.() -> Unit,
): Unit =
  process.use { env ->
    env.runScope(context) {
      val job =
        launch(start = CoroutineStart.LAZY) {
          try {
            block()
            env.exit(0) // <- here
          } catch (_: SuspendAppShutdown) {} catch (e: Throwable) {
            uncaught(e)
            env.exit(-1)
          }
        }
      val unregister =
        env.onShutdown {
          withTimeout(timeout) {
            job.cancel(SuspendAppShutdown)
            job.join()
          }
        }
      job.start()
      job.join()
      unregister()
    }
  }

This call triggers the sequence of JVM shutdown hooks, which then executes env.OnShutdown lambda defined in unregister variable, where the job is being cancelled (and that explains why our kafka consumer was suddenly stopped and why running it in a new scope allowed the consumer to proceed). The process then waits for the cancelled job to complete, but that never happens - the application basically hangs, unless a finite timeout duration is specified, then it simply terminates with timeout exception.

My question is - what is the rationale behind exiting the process after lambda execution instead of letting the job complete and eventually calling unregister() as it was done in version 0.4.0? Right now it doesn't seem like the unregistering logic is ever executed, as the exit(0) method is called first.

If this behaviour is expected, how should the logic of SuspendApp be defined to make sure that the application actually terminates at some point?

Thanks in advance for your help, for now we're skipping the upgrade of the library until we get some more insight 😄

@serras
Copy link
Member Author

serras commented Feb 11, 2025

Ported from arrow-kt/suspendapp#140 (comment), answer by @nomisRev

There was a small refactor in the codebase to include system.exit at the right moments. This was needed for some cloud providers that required explicit system.exit to close pods. To guarantee the same implementation for all platforms, a change was made in the KMP structure.

However, this should absolutely not happen! I am investigating atm, and will report back asap.

Also, any ideas or tips to properly test this automatically would be awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant