Fix CDI request context activation race when Quarkus REST and Reactive Routes are used during the same request #51299

michalvavrik · 2025-11-28T18:23:41Z

closes ContextNotActiveException in the middle of a request handler #51230

michalvavrik · 2025-11-28T18:38:46Z

@geoand I have run the reproducer and the test I added couple of times and this fixes the issue. I tried to run few related modules like RR and Quarkus REST and didn't see anything failing. Let's see what CI does. @mkouba may also be interested.

geoand · 2025-11-28T20:00:21Z

Cool!

I'll have a look soon

geoand

Thanks a lot!

mkouba · 2025-12-01T12:27:29Z

extensions/reactive-routes/runtime/src/main/java/io/quarkus/vertx/web/runtime/RouteHandler.java

            } finally {
-                // Deactivate the context, i.e. cleanup the thread locals
-                requestContext.deactivate();
+                if (RoutingUtils.isCdiRequestContextOwner(context, REACTIVE_ROUTES_KEY)) {


Isn't this check unnecessary? I mean, at this point the owner must be quarkus-reactive-routes (unless something replaced the CURRENT_CDI_REQUEST_CTX_OWNER)...

Isn't this check unnecessary? I mean, at this point the owner must be quarkus-reactive-routes (unless something replaced the CURRENT_CDI_REQUEST_CTX_OWNER)...

That is not how I read the situation. When routingContext.next() happens, other route handlers are called synchronously until some code is scheduled (like run blocking), at which point this can be invoked. So it can easily be Quarkus REST.

Sorry, I meant "synchronously".

Isn't this check unnecessary? I mean, at this point the owner must be quarkus-reactive-routes (unless something replaced the CURRENT_CDI_REQUEST_CTX_OWNER)...

That is not how I read the situation. When routingContext.next() happens, other route handlers are called synchronously until some code is scheduled (like run blocking), at which point this can be invoked. So it can easily be Quarkus REST.

Yes, but all those handlers share the same RoutingContext...

The point is that finally is run when Quarkus REST is processing because routingContext.next() is run before.

Yes, but all those handlers share the same RoutingContext...

I don't know how to answer that :-) You will need to elaborate.

The point is that finally is run when Quarkus REST is processing because routingContext.next() is run before.

Ok, so we don't want to deactivate the context either. Small note - if Vertx is used then requestContext.deactivate() is a NOOP if duplicated context is used: https://github.com/quarkusio/quarkus/blob/main/extensions/vertx/runtime/src/main/java/io/quarkus/vertx/runtime/VertxCurrentContextFactory.java#L99

The point is that finally is run when Quarkus REST is processing because routingContext.next() is run before.

Ok, so we don't want to deactivate the context either. Small note - if Vertx is used then requestContext.deactivate() is a NOOP if duplicated context is used: https://github.com/quarkusio/quarkus/blob/main/extensions/vertx/runtime/src/main/java/io/quarkus/vertx/runtime/VertxCurrentContextFactory.java#L99

I didn't know, thanks. I thought that the bug is about the endHandler where is destroy invoked, but during the debugging I saw my breakpoint on deactivate called after Quarkus REST started doing stuff, so I'd keep it here just in case you change how deactivate works in the future.

mkouba · 2025-12-01T12:27:57Z

extensions/vertx-http/runtime/src/main/java/io/quarkus/vertx/http/runtime/RoutingUtils.java

+     * @param owner typically a route handler that needs CDI request context active
+     */
+    public static void assumeCdiRequestContext(RoutingContext ctx, String owner) {
+        ctx.put(CURRENT_CDI_REQUEST_CTX_OWNER, Objects.requireNonNull(owner));


Shouldn't we throw an exception if the owner is already set?

Shouldn't we throw an exception if the owner is already set?

The idea is to recognize which route handler is handling request if and only if it is reusing the CDI request context. So what happens right now, for example also due to this PR #40408 is that we are reusing already active CDI request context (I think Martin solved some issue that way).

I'm sorry but I don't undertand how it's supposed to work.

Well, at this point I'll only repeat myself. I am happy to explain it differently, but I don't know what part you don't understand. assumeCdiRequestContext method is called by Quarkus REST before Reactive Routes tries to destroy our state in the endHandler. Whether finally itself runs depends on what next route handlers do.

I'm sorry but I don't undertand how it's supposed to work.

I have added tests. Maybe you can run them and see what they solve?

assumeCdiRequestContext method is called by Quarkus REST before Reactive Routes tries to destroy our state in the endHandler

So Quarkus REST can intentionally replace the original owner (Reactive Routes)? If that's expected then it might make sense to add some debug logging in assumeCdiRequestContext()..

assumeCdiRequestContext method is called by Quarkus REST before Reactive Routes tries to destroy our state in the endHandler

So Quarkus REST can intentionally replace the original owner (Reactive Routes)?

Exactly. The way it works (sorry if I am repeating myself):

Reactive Routes activate CDI request context for the callback with user code (e.g. @RouteFilter)

user code calls routingContext.next()

some other handlers run depending on the order until Quarkus REST handler runs, it detects active CDI request context and reuses it

next depends on what is done:

the finally block in the Reactive Routes may run in some scenarios like when blocking code is scheduled and so on; but you already explained it is NOOP, funny I saw context not active in stacktrace for CDI interceptors where it failed; anyway, this fixed it

the endHandler may run before all code in Quarkus REST is finished, as shown by the user reproducer with socket timeout on the client side; we need Quarkus REST to take responsibility on when it is destroys CDI context state

If that's expected then it might make sense to add some debug logging in assumeCdiRequestContext()..

I'll add it, so that we can keep track of it. Thanks.

I have added a debug logging to track the ownership.

mkouba · 2025-12-01T12:43:06Z

CC @manovotn

michalvavrik · 2025-12-01T22:06:40Z

I don't think that failures are related, I only added a debug log message and CIs fail over OOM in unmodified modules.

geoand · 2025-12-02T07:13:10Z

Very odd indeed... I restarted those jobs - let's see what happens

Ladicek · 2025-12-02T14:00:53Z

I don't have the full context and I think I never really looked into how the CDI request context is activated/deactivated during HTTP request processing, but I would hope that we have exactly 1 Vert.x Web router handler with suitably low priority that would activate/deactivate the CDI request context and all other frameworks built on top of that would just not touch the CDI request context at all costs. I'm sure there's a reason for why it's done the way it's done, but is there a reason for why it's not done the way I described?

michalvavrik · 2025-12-02T14:28:41Z

I don't have the full context and I think I never really looked into how the CDI request context is activated/deactivated during HTTP request processing, but I would hope that we have exactly 1 Vert.x Web router handler with suitably low priority that would activate/deactivate the CDI request context and all other frameworks built on top of that would just not touch the CDI request context at all costs.

CDI request context is only activated when needed. There is no such a handler ATM.

I'm sure there's a reason for why it's done the way it's done, but is there a reason for why it's not done the way I described?

I think that is a question for each stack owner. I remember few arguments mentioned in the past (like you may never need CDI request context, or that you may need your own like WS Next OnMessage callbacks), but I don't want to advocate either.

Unless we make general change as you describe, this PR is pretty uninvasive, it only does something if Reactive Routes activated CDI request context, so for 99 % use cases, it should be NOOP.

michalvavrik · 2025-12-02T14:30:18Z

@geoand that CI error is strange and I don't think it is related, but I'll rebase on the current main.

mkouba · 2025-12-02T15:36:52Z

I don't have the full context and I think I never really looked into how the CDI request context is activated/deactivated during HTTP request processing, but I would hope that we have exactly 1 Vert.x Web router handler with suitably low priority that would activate/deactivate the CDI request context and all other frameworks built on top of that would just not touch the CDI request context at all costs.

CDI request context is only activated when needed. There is no such a handler ATM.

I'm sure there's a reason for why it's done the way it's done, but is there a reason for why it's not done the way I described?

I think that is a question for each stack owner. I remember few arguments mentioned in the past (like you may never need CDI request context, or that you may need your own like WS Next OnMessage callbacks), but I don't want to advocate either.

I believe that there are cases where you don't want to activate the CDI request context (think about static resources or high performance endpoints). But I also think that the main reason is historical development. There were times when Vertx duplicated context was not a thing and threadlocals were used instead, stacks used different thread pool strategies, etc. It was easier to manage the activation separately. Also keep in mind that it's not a common use case to combine the stacks 🤷.

geoand · 2025-12-02T15:38:09Z

Also keep in mind that it's not a common use case to combine the stacks

Sure, but it doesn't mean we can't make it work, at the end of day, all the HTTP stuff is backed by Vert.x (and one of the reasons was to be able to mix such kinds of things)

mkouba · 2025-12-02T15:41:07Z

Also keep in mind that it's not a common use case to combine the stacks

Sure, but it doesn't mean we can't make it work, at the end of day, all the HTTP stuff is backed by Vert.x (and one of the reasons was to be able to mix such kinds of things)

I'm not saying we can't... I was talking about why we don't have a common activation handler.

manovotn · 2025-12-02T16:04:13Z

extensions/vertx-http/runtime/src/main/java/io/quarkus/vertx/http/runtime/RoutingUtils.java

+     * @param ctx RoutingContext
+     * @param newOwner typically a route handler that needs CDI request context active
+     */
+    public static void assumeCdiRequestContext(RoutingContext ctx, String newOwner) {


@michalvavrik is there any guarantee that the last handler to claim the context as their own is always going to be the last to actually attempt to use it?

I am not familiar enough with the code to be able to tell so maybe I am missing something obvious.
However, my thinking was that I would capture a collection of all handlers when they attempt to activate it. Later on, when a handler attempts to deactivate context, remove the handler from the collection and only the last to be removed actually performs the cleanup.
Then again, I am probably just missing something, so take it with a pinch of salt please :)

@michalvavrik is there any guarantee that the last handler to claim the context as their own is always going to be the last to actually attempt to use it?

It is for the scenario I am trying to solve, but if we treat this method as SPI that theoretically anyone could reuse, then probably no. IMO there should be always only one route handler actively handling the incoming request, because if there were multiple handlers using the same request, we would be in for whole lot of issues like this one #49468.

So what I meant to solve is:

avoid code executed after user callback runs routingContext.next(), such as

the finally block running requestContext.deactivate()

the endHandler code destroying CDI context state

And if in the future (or in quarkiverse) someone does things like these, we cannot guarantee we will be the last ones.

However, my thinking was that I would capture a collection of all handlers when they attempt to activate it. Later on, when a handler attempts to deactivate context, remove the handler from the collection and only the last to be removed actually performs the cleanup.

I think that would be a solution covering all eventualities, but the only way to keep it safe (be sure it is cleaned) would be synchronization (or other locking) which could have performance impact if it was hot path? I think there is a reason why RoutingContext data are not kept in the synchronized map and not in volatile field, and that is that we don't expect things to happen concurrently. But I can be wrong, not an expert on this matter.

IMO current code in Quarkus REST doesn't expect that someone else is handling active CDI request context and I have seen the same thing in gRPC. I'd keep it simple (as is, or just rename method / adjust javadoc if you have some suggestion) @manovotn , but I can do what you want if you deem it important.

First off, I don't think this should be just a local discussion, all relevant people need to be involved.

I think we've established that there may be multiple users (as in, Quarkus components) that want to access the same request context. We've also established that we want the request context created on demand, not eagerly. (Thanks for those remarks! It also feels like déjà vu to me. And I'm sure it will happen again.)

I'm gonna claim a simple int (or AtomicInteger) stored on a shared location (the Vert.x context probably? What about non-Vert.x threads though?) would be enough. If someone wants to create the request context, two things may happen.

If they don't see the int on the shared location, they are the first to activate the request context. They do so and set the value to 1.

If they do see the int on the shared location, the request context already exist and they just increment the value.

Conversely, if someone wants to destroy the request context, two (or three, depending how thorough we are) things may happen:

They see the int on the shared location, they decrement the value and the result is > 0. They don't do anything more.

They see the int on the shared location, they decrement the value and the result is 0. They destroy the request context.

They don't see the int on the shared location. The request context wasn't activated, so there's no point in destroying it.

I think the above should be enough to ensure that the request context can be activated on-demand and destroyed only after all possible users are done with it.

I think either this could be put into a utility class, probably in ArC, or perhaps this could be done in the ArC implementation of request context itself?

They see the int on the shared location, they decrement the value and the result is > 0. They don't do anything more.

AFAIU this would not help with the use case @michalvavrik is trying to fix. This sentence from #51299 (comment) is important: "IMO current code in Quarkus REST doesn't expect that someone else is handling active CDI request context and I have seen the same thing in gRPC"

AFAIU this would not help with the use case @michalvavrik is trying to fix. This sentence from #51299 (comment) is important: "IMO current code in Quarkus REST doesn't expect that someone else is handling active CDI request context and I have seen the same thing in gRPC"

Yeah, but that assumption is wrong, otherwise the issue and this PR wouldn't exist. We clearly have multiple components assuming they own the request context, which is wrong.

Personally, I am fine with this change and its local scope.

I know I'm not the one to decide here, but I'm personally not. This PR is a result of great investigation, but the solution presented is a band-aid applied on a wrong abstraction level.

If we actually want to support multiple "owners" of the same request context, I think something like I describe above is what we should do, and the more I think about it, the more I'm convinced the request context implementation itself should do it. (This is obviously performance sensitive...)

Now, there's a difference between deactivation and destruction and I didn't think through this distinction, so there would be more work to do. Still, it feels better to do it properly once, rather than doing it improperly (at least once, but possibly multiple times) and then having to do it properly and undoing the hacks.

Frankly, ATM I don't even have time to carefully read all your comments, but I got a gist that @Ladicek have in mind solid solution and @manovotn agree in principle.

I'd be fine to close this PR if you will handle the linked issue @Ladicek ? You can reuse that test, I wrote a lot of comments about re-running it, but it actually reproduced the issue pretty well from my workstation. I just don't know how stable reproducing will be from your laptop.

Anyway, if noone else handles the issue, I can go back to it by the end of this week, read all your comments and try to come up with something.

I just talked to @mkouba and @manovotn and we concluded that concurrency in Quarkus is pretty much a mess (in how DuplicatedContext is passed around and duplicated/triplicated/whatever, in how SmallRye Context Propagation works/is integrated, in how Vert.x and non-Vert.x thread are different). I certainly don't aim to fix that.

We also briefly discussed the difference between context deactivation and destruction and concluded that deactivation itself shouldn't "unregister" the owner, only destruction should. We didn't discuss what should constitute the "registration" of the request context owner, but it seems logical to me that it's the call to activate() with no initial state (that is, zero-param activate() or one-param activate() with null argument).

Looking at the implementation of CurrentManagedContext, it seems relatively straightforward to support multiple owners. It should only affect these places:

method ContextState activate(ContextState initialState)

method void destroy(ContextState state)

member class CurrentContextState (we need to track the number of current owners, but that should always be small, so it should fit into the state field -- it seems safe to me to assume that there will never be more than 32 owners, in fact I think we can assume there will never more than 16 owners to leave 1 bit reserved for future use)

I'll try to figure something out.

In the meantime, I'm certainly not blocking the merge of this PR, because both @mkouba and @manovotn are fine with it.

Thanks for all your comments and expertise everyone.

We didn't discuss what should constitute the "registration" of the request context owner, but it seems logical to me that it's the call to activate() with no initial state (that is, zero-param activate() or one-param activate() with null argument).

OK, so that is clearly wrong. (I knew there was a hurdle!) Activating a request context with no initial state is creating a new request context. That only happens once. Registration of an additional owner may only happen when activating with non-null initial state, and at that point, there's no way to know if that means a newly registered owner, or an already registered owner. So my nice and simple idea makes no sense :-)

manovotn · 2025-12-02T16:10:03Z

CC @manovotn

I do not know how exactly do routing handlers juggle with CDI contexts but what Michal proposes here seems sensible to me particularly because of his earlier comment:

the endHandler may run before all code in Quarkus REST is finished, as shown by the user reproducer with socket timeout on the client side; we need Quarkus REST to take responsibility on when it is destroys CDI context state

The only thing I added as a comment is related my limited knowledge of whether this solution is "generic enough".

I don't have the full context and I think I never really looked into how the CDI request context is activated/deactivated during HTTP request processing, but I would hope that we have exactly 1 Vert.x Web router handler with suitably low priority that would activate/deactivate the CDI request context and all other frameworks built on top of that would just not touch the CDI request context at all costs. I'm sure there's a reason for why it's done the way it's done, but is there a reason for why it's not done the way I described?

This gives me a bit of a déjà vu feeling, I think we discussed this at some point although I only recall there were some cases where that would be undesirable 🤷
That was a long time ago and much has changed since though (such as using Vert.x duplicated context for CP).

michalvavrik · 2025-12-02T18:00:26Z

@geoand CI failures in #51314 looks same as the failures we saw in this one.

geoand · 2025-12-02T18:04:04Z

Agreed, I'll have a closer look tomorrow

geoand · 2025-12-03T08:12:56Z

The OpenAPI failure is something that has been happening on CI the last few days, it has nothing to do with this PR

michalvavrik · 2025-12-03T19:25:15Z

I think this PR is backportable while if I think most of the suggestions, if implemented, should only stay in the main. Considering 3.27 will be around for 5 years, I think this (in some form) should be initial fix and improvements can come later.

geoand · 2025-12-05T07:59:15Z

Unless anyone objects, I will merge this soon

quarkus-bot · 2025-12-05T10:41:20Z

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit 2afbd09.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

Ladicek · 2025-12-05T10:55:51Z

Even after the 2 or 3 days I spent looking into this, I don't have a satisfactory solution, so I'm retracting any possible objections I might have raised before :-)

I experimented with a single-owner policy and multiple-owners policy, but my implementations are not suitable to implementing what this PR does, which is essentially a handover of the CDI request context ownership from Reactive Routes to RESTEasy Reactive. In order for them to be suitable, I would have to:

either introduce the notion of an identity of an owner, which would entail:
- either more overhead in a performance sensitive piece of code,
- or at least shared constants (whose number would be severely limited),
or implement a single-purpose solution, which would hardly be any better than this PR and would pollute the API.

Too bad :-/

geoand · 2025-12-05T11:05:43Z

That's great information to have for posterity

FroMage · 2025-12-05T11:15:59Z

If I had a euro for every time we had a context issue…

quarkus-bot bot added area/rest area/vertx labels Nov 28, 2025

michalvavrik changed the title ~~Fix CDI request context deactivation race when Quarkus REST and Reactive Routes are used during the same request~~ Fix CDI request context activation race when Quarkus REST and Reactive Routes are used during the same request Nov 28, 2025

michalvavrik force-pushed the feature/fix-rest-cdi-req-ctx-active-after-timeout branch 2 times, most recently from bbb997a to 640e603 Compare November 28, 2025 18:36

This comment has been minimized.

Sign in to view

quarkus-bot bot added the triage/flaky-test label Nov 28, 2025

This comment has been minimized.

Sign in to view

geoand approved these changes Nov 30, 2025

View reviewed changes

geoand requested a review from mkouba November 30, 2025 06:13

mkouba reviewed Dec 1, 2025

View reviewed changes

michalvavrik force-pushed the feature/fix-rest-cdi-req-ctx-active-after-timeout branch from 640e603 to a368091 Compare December 1, 2025 16:21

This comment has been minimized.

Sign in to view

feat(reactive-routes,rest): fix CDI request ctx deactivation race

2afbd09

michalvavrik force-pushed the feature/fix-rest-cdi-req-ctx-active-after-timeout branch from a368091 to 2afbd09 Compare December 2, 2025 14:30

manovotn reviewed Dec 2, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

michalvavrik mentioned this pull request Dec 3, 2025

Support security annotations on Jakarta Data repositories #50987

Open

geoand merged commit f96127f into quarkusio:main Dec 5, 2025
124 of 125 checks passed

quarkus-bot bot added this to the 3.31 - main milestone Dec 5, 2025

quarkus-bot bot added the kind/bugfix label Dec 5, 2025

michalvavrik deleted the feature/fix-rest-cdi-req-ctx-active-after-timeout branch December 5, 2025 11:45

Fix CDI request context activation race when Quarkus REST and Reactive Routes are used during the same request #51299

Fix CDI request context activation race when Quarkus REST and Reactive Routes are used during the same request #51299

Conversation

michalvavrik commented Nov 28, 2025

Uh oh!

michalvavrik commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoand commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

geoand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michalvavrik Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkouba Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkouba commented Dec 1, 2025

Uh oh!

This comment has been minimized.

michalvavrik commented Dec 1, 2025

Uh oh!

geoand commented Dec 2, 2025

Uh oh!

This comment has been minimized.

Ladicek commented Dec 2, 2025

Uh oh!

michalvavrik commented Dec 2, 2025

Uh oh!

michalvavrik commented Dec 2, 2025

Uh oh!

mkouba commented Dec 2, 2025

Uh oh!

geoand commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkouba commented Dec 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

michalvavrik commented Nov 28, 2025 •

edited

Loading

geoand commented Nov 28, 2025 •

edited

Loading

michalvavrik Dec 1, 2025 •

edited

Loading

mkouba Dec 1, 2025 •

edited

Loading

geoand commented Dec 2, 2025 •

edited

Loading

Status for workflow `Quarkus CI`