Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is PyCall thread safe? #96

Closed
gsdean opened this issue Sep 4, 2019 · 17 comments
Closed

Is PyCall thread safe? #96

gsdean opened this issue Sep 4, 2019 · 17 comments
Milestone

Comments

@gsdean
Copy link

gsdean commented Sep 4, 2019

Seems that when we do anything with PyCall on multiple threads we get Segmentation faults.

Happy to provide more context, just wanted to check if this was designed to be in a multi-threaded environment first

@mrkn
Copy link
Owner

mrkn commented Sep 5, 2019

No, the current pycall isn't thread safe both Ruby and Python sides.

@ziaulrehman40
Copy link

Does this mean that if I queue say lot of background jobs in ruby, which can run concurrently obviously, those jobs cannot call pycall's functions at the same time?
And i will have to reduce my concurrency for such jobs to 1 for pycall to work(which won't be practical for me, but confirming so i can decide my course of action)

@mrkn
Copy link
Owner

mrkn commented Jan 13, 2020

@ziaulrehman40 No, they can't call Python via pycall at the same time.
It is better that you make the single thread from which calls Python.

@ziaulrehman40
Copy link

Well, that's a deal breaker for my use case. Most payloads are web related in today's world and require concurrency. I wonder what makes it fail in those scenarios and is there anyway community can help fix it.

For my use case, i had to use https://github.com/camelot-dev/camelot python library in my ruby code, we have decided now to just use its CLI option and call that CLI with system method of ruby. And as that CLI spits out files on disk, we can process them later. Giving us our concurrency. Obviously not best solution but solves our issue.

Thanks for the great effort in PyCall though, and I really hope we can enjoy this in concurrent scenarios soon.

@simonfranzen
Copy link

Any news on this topic? We want to call python libraries within a background job (sidekiq+rails).

@nhorton
Copy link

nhorton commented Aug 11, 2023

Is there any info on where the pycall gaps are on this so that others can contribute? Like @simonfranzen, we need to use the same stack.

@mrkn mrkn added this to the Version 2.0.0 milestone Aug 25, 2023
@davidsbailey
Copy link

Any news on this topic? We want to call python libraries within a background job (sidekiq+rails).

@simonfranzen you could consider rails + resque, which uses a separate process for each job.

@jeremyhaile
Copy link

jeremyhaile commented Feb 13, 2024

It definitely feels like this should be thread-safe, given how common threaded usage is these days, with web servers and background jobs.

But if it's not thread-safe, it also feels like the main README should have a big warning about that. We've spent a lot of time trying to figure out why our application is seg faulting!

Has anyone worked around this by wrapping pycall usage in a semaphore? (that's what we are trying now)

(we are also calling this from sidekiq background jobs)

@mrkn
Copy link
Owner

mrkn commented Feb 13, 2024

@jeremyhaile I want to accept your pull-request if you can make pycall thread safe even if it doesn’t introduce any overhead to single-threaded applications.

By the way, I’m working for streamlit-julia-call in my job nowadays. I’ve succeeded to bridge between multithreaded Python application and Julia in that project. I believe we can employ the similar approach in this project. When I have more time to tackle this issue again in the future, I want to try this approach. However, unfortunately, I am currently very busy and cannot afford to dedicate time to this project, so I hope someone eager to resolve this issue quickly can take over for me.

@snickell
Copy link

snickell commented Aug 8, 2024

Note that if you are using Puma or Rails, even if you set thread=1 it may not be safe to use PyCall from a web request handler, because the request thread is STILL a different thread from the main thread. If you have thread>1, it is always unsafe to use Puma from a web request handler.

@snickell
Copy link

snickell commented Aug 8, 2024

Related but different issue: even if you use PyCall from only one thread, if that thread is not the main thread, the process will not exit when the main thread exits (even though the side thread exited): #186

@snickell
Copy link

snickell commented Aug 8, 2024

Workaround for PyCall safety in Rails / Puma

I made a helper gem called pycall_thread (rubygems) that helps workaround PyCall's lack of thread-safety.
It makes it easy to use PyCall even from multiple threads, like Ruby on Rails or Puma requests.

You use it like:

data = PyCallThread.run do
  # pycall is safe to use here even if you're in a thread, e.g. a puma request
  pd = PyCall.import_module('pandas')
  pandas.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', sep: ';').to_json
end

puts JSON.parse(data).inspect

All it does is initialize PyCall on an inner thread, and pass blocks to the inner thread for execution. This keeps PyCall happy and thread-safe, and lets you use it without too much issue from Rails or Puma.

It has a few guard-rails:

  1. it'll warn you if you try to return a python object from a thread-safe block
  2. it'll warn you if PyCall was already initialized on another thread

GitHub: https://github.com/snickell/pycall_thread

@snickell
Copy link

snickell commented Aug 8, 2024

@mrkn would you be open to a thread-safety helper like this being included in PyCall.rb? I would be happy to make a PR that adds this as, for example as PyCall::Thread, with any modifications you might suggest. Let me know if this would be helpful.

@mrkn
Copy link
Owner

mrkn commented Aug 8, 2024

@snickell Is the queue-based approach what you really need? In this approach, each pycall-dependent thread is blocked by others. If you want to use pycall on Puma threads, you need to use process-based multi-tasking to call Python via PyCall to avoid such thread blocking.

@snickell
Copy link

snickell commented Aug 8, 2024

@mrkn, thank you for your thoughts. Please let me know if I misunderstood.

I will be using this on a large existing Ruby on Rails deployment (code.org / https://github.com/code-dot-org/code-dot-org) with more than 1000 puma processes (and 5-threads-per-process). I wish it was only 1-thread-per-process, but because it is a large application with a 10 year history, I cannot change this configuration easily.

Because our multitasking is mostly process-based (~1000 processes in parallel), it is acceptable (but not perfect), that python threads will block each other inside each process. Our Python code will mostly be CPU-heavy, not IO-heavy. The GIL will block CPU-heavy Python threads, so I believe the difference will be small?

I like perfect 😹: is there a better solution? I would like to understand if there is.

@mrkn
Copy link
Owner

mrkn commented Oct 22, 2024

@snickell, Could you please release these features as an external gem? I don't want to officially support multi-threading in PyCall.

Currently, CPython developers will drop GIL in CPython. In the near future, we cannot handle multi-threading by the pair of PyGILState_Ensure and PyGILState_Release. I don't want to incur the costs to support such use cases now.

@mrkn
Copy link
Owner

mrkn commented Oct 22, 2024

I explicitly described multi-threading in README at 60c6656.

I appreciate your cooperation.

@mrkn mrkn closed this as completed Oct 22, 2024
@mrkn mrkn closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants