Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

ThreadLocalScopManager and parallelStream #353

Open
williamokano opened this issue Jun 17, 2019 · 7 comments
Open

ThreadLocalScopManager and parallelStream #353

williamokano opened this issue Jun 17, 2019 · 7 comments

Comments

@williamokano
Copy link

williamokano commented Jun 17, 2019

Hey guys, I was just testing some ways to implement tracing using a POC application I created.

One of the services (product service) is called by the order service, but the products services don't expose an endpoint that return all the products at once, so I have to call it several times.

I know it's not good, not the best approach, but it's just a simulation that can occur in the real world.

The API is not well formed, so the items is represented by a Map<String, Long>, which is the item UUID and the item quantity.

My code run something like this (Kotlin):

val items = order.items.keys.parallelStream()
  .map(catalogAPI::getProduct)
  .collect(Collectors.toList())

My problem: as the parallelStream runs on another Thread(Pool), and the default implementation of the ScopeManager is ThreadLocalScopeManager, this doesn't work, as it loses the reference to the original span.

Question is, why it's not implemented using InheritableThreadLocal?

Second question is: there's some easy approach to solve this problem?

Thanks in advance and sorry about the bad english.

@yurishkuro
Copy link
Member

You can use custom thread pool (https://www.baeldung.com/java-8-parallel-streams-custom-threadpool) that will transfer context to worker threads.

@williamokano
Copy link
Author

williamokano commented Jun 18, 2019

Hi, I agree that I can use a custom thread pool, but the code can become much more verbose and relying on that can be error-prone, since one can simply forget to change the thread pool.

I created a "new" ScopeManager and Scope and it seems it solved my problem, although since I don't know the whole architecture of the opentracing I don't know to what kind of errors it could lead. With the following, I can use the parallelStream() without requiring to manually change the thread pool every time. Tried fixing the problem using InheritableThreadLocal.

@yurishkuro
Copy link
Member

I don't see how inheritable TL would help when using a thread pool, since the worker threads creation is not correlated with the caller thread.

I agree custom executor leads to uglier code, but that's a defect in ForkJoinPool design, since it does not allow reconfiguring its default "common" pool. To my knowledge, overriding it via byte code manipulation is the only way.

@williamokano
Copy link
Author

williamokano commented Jun 18, 2019

It actually didn't work.

@yurishkuro well, on my testing it worked just fine, at the least on the tests I did, which of course may have a lot of errors.

Do you have any examples of doing it through byte code manipulation? Not gonna lie that I was reading about ByteBuddy to create an agent and do some "magic stuff", but running the jar without the javaagent com lead to bizarre errors, so I stepped back a little bit.

@yurishkuro
Copy link
Member

I haven't seen an example anywhere, opened a ticket opentracing-contrib/java-specialagent#102

@williamokano
Copy link
Author

williamokano commented Jun 18, 2019

Nice, thanks. Gonna take a look at this specialagent, I didn't know about him. Thanks a lot.

@ edit: about the InheritableThreadLocal, I couldn't force a way to break my tests.

Not gonna lie, the first scenarios I was creating the executorService inside the controller method, so the context was kept, but now I wrote a scenario where the executor service is created by spring inside a configuration, so it shouldn't have any correlation with the thread create by tomcat to handle the request, and yet, I can keep the context with InheritableThreadLocal (which I can't with ThreadLocal).

@yurishkuro if you could check my sample, here's the link: https://github.com/williamokano/spring-distributed-tracing-demo

As you can see, I defined the ExecutorService, with a new ThreadPool, as a bean, so it has nothing to do with the Thread that is handling the request (should be completely uncorrelated as you said).

To test I used just a POC (not with opentracing actually).

I set a header on an InheritableThreadLocal<Context> inside a servletFilter so I can get it back on the controller.

When I print the value on the ExecutorService (created on the application startup), if I'm using InheritableThreadLocal it prints the right header, so it kept tracking somehow. If I use a regular ThreadLocal it doesn't work.

Also, using my implementation of ScopeManager using InheritableThreadLocal, although the parallelStream is running on the ForkJoinPool.commonPool(), it does work in fact.

I'm really confused right now. Perhaps it could be some bizarre OpenJDK implementation?

@ edit: you're right, executor service under stress don't work as I told above.

@williamokano
Copy link
Author

Update: you're correct @yurishkuro , without stress to the VM the behavior indeed work, but when under load it behaves as it should be, it loses the reference to the point that the behavior is non-deterministic, leading to undesired/undefined behaviors.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants