-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix CDI request context activation race when Quarkus REST and Reactive Routes are used during the same request #51299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CDI request context activation race when Quarkus REST and Reactive Routes are used during the same request #51299
Conversation
michalvavrik
commented
Nov 28, 2025
- closes ContextNotActiveException in the middle of a request handler #51230
bbb997a to
640e603
Compare
|
Cool! I'll have a look soon |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
geoand
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
| } finally { | ||
| // Deactivate the context, i.e. cleanup the thread locals | ||
| requestContext.deactivate(); | ||
| if (RoutingUtils.isCdiRequestContextOwner(context, REACTIVE_ROUTES_KEY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check unnecessary? I mean, at this point the owner must be quarkus-reactive-routes (unless something replaced the CURRENT_CDI_REQUEST_CTX_OWNER)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check unnecessary? I mean, at this point the owner must be
quarkus-reactive-routes(unless something replaced the CURRENT_CDI_REQUEST_CTX_OWNER)...
That is not how I read the situation. When routingContext.next() happens, other route handlers are called synchronously until some code is scheduled (like run blocking), at which point this can be invoked. So it can easily be Quarkus REST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant "synchronously".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check unnecessary? I mean, at this point the owner must be
quarkus-reactive-routes(unless something replaced the CURRENT_CDI_REQUEST_CTX_OWNER)...That is not how I read the situation. When
routingContext.next()happens, other route handlers are called synchronously until some code is scheduled (like run blocking), at which point this can be invoked. So it can easily be Quarkus REST.
Yes, but all those handlers share the same RoutingContext...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that finally is run when Quarkus REST is processing because routingContext.next() is run before.
Yes, but all those handlers share the same RoutingContext...
I don't know how to answer that :-) You will need to elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that
finallyis run when Quarkus REST is processing becauseroutingContext.next()is run before.
Ok, so we don't want to deactivate the context either. Small note - if Vertx is used then requestContext.deactivate() is a NOOP if duplicated context is used: https://github.com/quarkusio/quarkus/blob/main/extensions/vertx/runtime/src/main/java/io/quarkus/vertx/runtime/VertxCurrentContextFactory.java#L99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that
finallyis run when Quarkus REST is processing becauseroutingContext.next()is run before.Ok, so we don't want to deactivate the context either. Small note - if Vertx is used then
requestContext.deactivate()is a NOOP if duplicated context is used: https://github.com/quarkusio/quarkus/blob/main/extensions/vertx/runtime/src/main/java/io/quarkus/vertx/runtime/VertxCurrentContextFactory.java#L99
I didn't know, thanks. I thought that the bug is about the endHandler where is destroy invoked, but during the debugging I saw my breakpoint on deactivate called after Quarkus REST started doing stuff, so I'd keep it here just in case you change how deactivate works in the future.
| * @param owner typically a route handler that needs CDI request context active | ||
| */ | ||
| public static void assumeCdiRequestContext(RoutingContext ctx, String owner) { | ||
| ctx.put(CURRENT_CDI_REQUEST_CTX_OWNER, Objects.requireNonNull(owner)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we throw an exception if the owner is already set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we throw an exception if the owner is already set?
The idea is to recognize which route handler is handling request if and only if it is reusing the CDI request context. So what happens right now, for example also due to this PR #40408 is that we are reusing already active CDI request context (I think Martin solved some issue that way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I don't undertand how it's supposed to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, at this point I'll only repeat myself. I am happy to explain it differently, but I don't know what part you don't understand. assumeCdiRequestContext method is called by Quarkus REST before Reactive Routes tries to destroy our state in the endHandler. Whether finally itself runs depends on what next route handlers do.
I'm sorry but I don't undertand how it's supposed to work.
I have added tests. Maybe you can run them and see what they solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assumeCdiRequestContextmethod is called by Quarkus REST before Reactive Routes tries to destroy our state in theendHandler
So Quarkus REST can intentionally replace the original owner (Reactive Routes)? If that's expected then it might make sense to add some debug logging in assumeCdiRequestContext()..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assumeCdiRequestContextmethod is called by Quarkus REST before Reactive Routes tries to destroy our state in theendHandlerSo Quarkus REST can intentionally replace the original owner (Reactive Routes)?
Exactly. The way it works (sorry if I am repeating myself):
- Reactive Routes activate CDI request context for the callback with user code (e.g.
@RouteFilter) - user code calls
routingContext.next() - some other handlers run depending on the order until Quarkus REST handler runs, it detects active CDI request context and reuses it
- next depends on what is done:
- the
finallyblock in the Reactive Routes may run in some scenarios like when blocking code is scheduled and so on; but you already explained it is NOOP, funny I saw context not active in stacktrace for CDI interceptors where it failed; anyway, this fixed it - the
endHandlermay run before all code in Quarkus REST is finished, as shown by the user reproducer with socket timeout on the client side; we need Quarkus REST to take responsibility on when it is destroys CDI context state
If that's expected then it might make sense to add some debug logging in
assumeCdiRequestContext()..
I'll add it, so that we can keep track of it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a debug logging to track the ownership.
|
CC @manovotn |
640e603 to
a368091
Compare
This comment has been minimized.
This comment has been minimized.
|
I don't think that failures are related, I only added a debug log message and CIs fail over OOM in unmodified modules. |
|
Very odd indeed... I restarted those jobs - let's see what happens |
This comment has been minimized.
This comment has been minimized.
|
I don't have the full context and I think I never really looked into how the CDI request context is activated/deactivated during HTTP request processing, but I would hope that we have exactly 1 Vert.x Web router handler with suitably low priority that would activate/deactivate the CDI request context and all other frameworks built on top of that would just not touch the CDI request context at all costs. I'm sure there's a reason for why it's done the way it's done, but is there a reason for why it's not done the way I described? |
CDI request context is only activated when needed. There is no such a handler ATM.
I think that is a question for each stack owner. I remember few arguments mentioned in the past (like you may never need CDI request context, or that you may need your own like WS Next OnMessage callbacks), but I don't want to advocate either. Unless we make general change as you describe, this PR is pretty uninvasive, it only does something if Reactive Routes activated CDI request context, so for 99 % use cases, it should be NOOP. |
|
@geoand that CI error is strange and I don't think it is related, but I'll rebase on the current main. |
a368091 to
2afbd09
Compare
I believe that there are cases where you don't want to activate the CDI request context (think about static resources or high performance endpoints). But I also think that the main reason is historical development. There were times when Vertx duplicated context was not a thing and threadlocals were used instead, stacks used different thread pool strategies, etc. It was easier to manage the activation separately. Also keep in mind that it's not a common use case to combine the stacks 🤷. |
Sure, but it doesn't mean we can't make it work, at the end of day, all the HTTP stuff is backed by Vert.x (and one of the reasons was to be able to mix such kinds of things) |
I'm not saying we can't... I was talking about why we don't have a common activation handler. |
| * @param ctx RoutingContext | ||
| * @param newOwner typically a route handler that needs CDI request context active | ||
| */ | ||
| public static void assumeCdiRequestContext(RoutingContext ctx, String newOwner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michalvavrik is there any guarantee that the last handler to claim the context as their own is always going to be the last to actually attempt to use it?
I am not familiar enough with the code to be able to tell so maybe I am missing something obvious.
However, my thinking was that I would capture a collection of all handlers when they attempt to activate it. Later on, when a handler attempts to deactivate context, remove the handler from the collection and only the last to be removed actually performs the cleanup.
Then again, I am probably just missing something, so take it with a pinch of salt please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michalvavrik is there any guarantee that the last handler to claim the context as their own is always going to be the last to actually attempt to use it?
It is for the scenario I am trying to solve, but if we treat this method as SPI that theoretically anyone could reuse, then probably no. IMO there should be always only one route handler actively handling the incoming request, because if there were multiple handlers using the same request, we would be in for whole lot of issues like this one #49468.
So what I meant to solve is:
- avoid code executed after user callback runs
routingContext.next(), such as- the
finallyblock runningrequestContext.deactivate() - the
endHandlercode destroying CDI context state
- the
And if in the future (or in quarkiverse) someone does things like these, we cannot guarantee we will be the last ones.
However, my thinking was that I would capture a collection of all handlers when they attempt to activate it. Later on, when a handler attempts to deactivate context, remove the handler from the collection and only the last to be removed actually performs the cleanup.
I think that would be a solution covering all eventualities, but the only way to keep it safe (be sure it is cleaned) would be synchronization (or other locking) which could have performance impact if it was hot path? I think there is a reason why RoutingContext data are not kept in the synchronized map and not in volatile field, and that is that we don't expect things to happen concurrently. But I can be wrong, not an expert on this matter.
IMO current code in Quarkus REST doesn't expect that someone else is handling active CDI request context and I have seen the same thing in gRPC. I'd keep it simple (as is, or just rename method / adjust javadoc if you have some suggestion) @manovotn , but I can do what you want if you deem it important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, I don't think this should be just a local discussion, all relevant people need to be involved.
I think we've established that there may be multiple users (as in, Quarkus components) that want to access the same request context. We've also established that we want the request context created on demand, not eagerly. (Thanks for those remarks! It also feels like déjà vu to me. And I'm sure it will happen again.)
I'm gonna claim a simple int (or AtomicInteger) stored on a shared location (the Vert.x context probably? What about non-Vert.x threads though?) would be enough. If someone wants to create the request context, two things may happen.
- If they don't see the
inton the shared location, they are the first to activate the request context. They do so and set the value to1. - If they do see the
inton the shared location, the request context already exist and they just increment the value.
Conversely, if someone wants to destroy the request context, two (or three, depending how thorough we are) things may happen:
- They see the
inton the shared location, they decrement the value and the result is> 0. They don't do anything more. - They see the
inton the shared location, they decrement the value and the result is0. They destroy the request context. - They don't see the
inton the shared location. The request context wasn't activated, so there's no point in destroying it.
I think the above should be enough to ensure that the request context can be activated on-demand and destroyed only after all possible users are done with it.
I think either this could be put into a utility class, probably in ArC, or perhaps this could be done in the ArC implementation of request context itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- They see the
inton the shared location, they decrement the value and the result is> 0. They don't do anything more.
AFAIU this would not help with the use case @michalvavrik is trying to fix. This sentence from #51299 (comment) is important: "IMO current code in Quarkus REST doesn't expect that someone else is handling active CDI request context and I have seen the same thing in gRPC"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU this would not help with the use case @michalvavrik is trying to fix. This sentence from #51299 (comment) is important: "IMO current code in Quarkus REST doesn't expect that someone else is handling active CDI request context and I have seen the same thing in gRPC"
Yeah, but that assumption is wrong, otherwise the issue and this PR wouldn't exist. We clearly have multiple components assuming they own the request context, which is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I am fine with this change and its local scope.
I know I'm not the one to decide here, but I'm personally not. This PR is a result of great investigation, but the solution presented is a band-aid applied on a wrong abstraction level.
If we actually want to support multiple "owners" of the same request context, I think something like I describe above is what we should do, and the more I think about it, the more I'm convinced the request context implementation itself should do it. (This is obviously performance sensitive...)
Now, there's a difference between deactivation and destruction and I didn't think through this distinction, so there would be more work to do. Still, it feels better to do it properly once, rather than doing it improperly (at least once, but possibly multiple times) and then having to do it properly and undoing the hacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, ATM I don't even have time to carefully read all your comments, but I got a gist that @Ladicek have in mind solid solution and @manovotn agree in principle.
I'd be fine to close this PR if you will handle the linked issue @Ladicek ? You can reuse that test, I wrote a lot of comments about re-running it, but it actually reproduced the issue pretty well from my workstation. I just don't know how stable reproducing will be from your laptop.
Anyway, if noone else handles the issue, I can go back to it by the end of this week, read all your comments and try to come up with something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just talked to @mkouba and @manovotn and we concluded that concurrency in Quarkus is pretty much a mess (in how DuplicatedContext is passed around and duplicated/triplicated/whatever, in how SmallRye Context Propagation works/is integrated, in how Vert.x and non-Vert.x thread are different). I certainly don't aim to fix that.
We also briefly discussed the difference between context deactivation and destruction and concluded that deactivation itself shouldn't "unregister" the owner, only destruction should. We didn't discuss what should constitute the "registration" of the request context owner, but it seems logical to me that it's the call to activate() with no initial state (that is, zero-param activate() or one-param activate() with null argument).
Looking at the implementation of CurrentManagedContext, it seems relatively straightforward to support multiple owners. It should only affect these places:
- method
ContextState activate(ContextState initialState) - method
void destroy(ContextState state) - member class
CurrentContextState(we need to track the number of current owners, but that should always be small, so it should fit into thestatefield -- it seems safe to me to assume that there will never be more than 32 owners, in fact I think we can assume there will never more than 16 owners to leave 1 bit reserved for future use)
I'll try to figure something out.
In the meantime, I'm certainly not blocking the merge of this PR, because both @mkouba and @manovotn are fine with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your comments and expertise everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't discuss what should constitute the "registration" of the request context owner, but it seems logical to me that it's the call to
activate()with no initial state (that is, zero-paramactivate()or one-paramactivate()withnullargument).
OK, so that is clearly wrong. (I knew there was a hurdle!) Activating a request context with no initial state is creating a new request context. That only happens once. Registration of an additional owner may only happen when activating with non-null initial state, and at that point, there's no way to know if that means a newly registered owner, or an already registered owner. So my nice and simple idea makes no sense :-)
I do not know how exactly do routing handlers juggle with CDI contexts but what Michal proposes here seems sensible to me particularly because of his earlier comment:
The only thing I added as a comment is related my limited knowledge of whether this solution is "generic enough".
This gives me a bit of a déjà vu feeling, I think we discussed this at some point although I only recall there were some cases where that would be undesirable 🤷 |
|
Agreed, I'll have a closer look tomorrow |
This comment has been minimized.
This comment has been minimized.
|
The OpenAPI failure is something that has been happening on CI the last few days, it has nothing to do with this PR |
|
I think this PR is backportable while if I think most of the suggestions, if implemented, should only stay in the main. Considering 3.27 will be around for 5 years, I think this (in some form) should be initial fix and improvements can come later. |
|
Unless anyone objects, I will merge this soon |
Status for workflow
|
|
Even after the 2 or 3 days I spent looking into this, I don't have a satisfactory solution, so I'm retracting any possible objections I might have raised before :-) I experimented with a single-owner policy and multiple-owners policy, but my implementations are not suitable to implementing what this PR does, which is essentially a handover of the CDI request context ownership from Reactive Routes to RESTEasy Reactive. In order for them to be suitable, I would have to:
Too bad :-/ |
|
That's great information to have for posterity |
|
If I had a euro for every time we had a context issue… |