-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Independent SubInterpreters are still not concurrent with python 3.12+ #593
Comments
Just to be clear, what SubInterpreterOptions did you set on your JepConfig when creating the SubInterpreters? Jep is just passing the options to the CPython interpreter so if there really is a problem I don't know if there's much we can do about it. |
Oh, sorry, totally miss sub interpreter options.
produces the same result: JVM just dies on first Interpreter.set(name, value) call which sets a non-primitive java value. That is, in the following sequence (cntx is an instance of Interpreter)
first two sets are working fine and JVM dies on third (java lambda value). It will also die on setting ML.class value if I flip last two lines. Everything works fine with legacy options so I doubt that it's a buffer overflow issue. Any ideas how to trace the reason? |
Thank you so much for the code examples you included. I initially tried to replicate the problem by adding some calls to set() in the isolated interpreter test but I could not get it to crash. When I took exactly what you have and pasted it in an java main it crashed immediately. It took some digging to find the problem but eventually I found that setting The problem is that PyJObject(this is the python class that represents java.lang.Object) is a statically allocated type which is shared between sub-interpreters. I had thought sharing would be safe because immutable objects are allowed to be shared but I failed to take into account that creating a subclass mutates the super class. The failure occurs when the a subclass of PyJObject is created in an isolated sub-interpreter. We create a subclass every time a new Java class is used in Python. Not every subclass causes a crash, it seems to only happen when the dict holding subclasses reaches a capacity threshold where it needs to grow. I assume the reason I couldn't crash it in our test case is because the tests do a lot of testing in SharedInterpreters and the subclass map has already grown large enough that further growth is infrequent. With an isolated main program it crashes after only a handful of subclasses. PEP-684 specifically discusses how subclasses can cause problems but until you found this crash I didn't realize it was going to impact us. Unfortunately I don't think we can fix this in jep 4.2. Mostly because it is going to be a pretty big change but also because we don't want to drop support for older Python versions in a point release. We use the python buffer protocols in subclasses of PyJObject and Before python 3.8 classes using the buffer protocol had to be statically defined. On the dev_4.3 branch I've already made changes so the buffer type is allocated on the heap but I was not planing on changing the way PyJObject and PyJClass are allocated. Your discovery definitely puts changing the way those are allocated on the agenda for the 4.3 release. If you want to experiment with isolated interpreters I found that setting PYTHONMALLOC to a different allocator would prevent the crash for me. I think there is still a possibility of problems with other allocators so I do not recommend doing this in production but if you want to test how isolated interpreters perform it might give you some idea. |
Thanks a lot for a quick response on this! From you comment about dict crashing when it need to grow there are some quick fix ideas:
Please let me know if I can help in any way. P.S. It's just occurred to me: Can I fix the problem by first creating a shared interpreter with all initializations (i.e. allow dict to grow to correct size) and then work with sub-interpreters? |
Just tried first to create a shared interpreter first and then work with sub-interpreters. It works! :-) Kinda :-(
Can you please take a look and tell me what else I can do to make it work. P.S. first log might not be totally useful b/c I was using visualvm and it would instrument the code potentially changing classes hs_err_pid8900.log I'm not an expert, but it seems both of them are trying to write something to a null pointer (freed memory?). Last one took about 10 test runs before dying. A few second pause between runs seems to help to keep it running so it might be related to GC, I guess |
The changes currently on the dev_4.3 branch are only a small step in the right direction. In Jep 4.2 I think there are 4 statically allocated types that need to be moved to heap allocated types to be safe in sub-interpreters. Those types are PyJObject, PyJClass, PyJArray, and PyJBuffer. The existing changes handle PyJBuffer leaving only 3 more. Unfortunatly byjbuffer is by far the easiest of the 4 since the type is only referenced during initialization. Since the other 3 are referenced more often we will have to save off the types in an interpreter specific data structure. I am still trying to understand what needs to change myself but from what I have seen I think PEP-630 describes all the things we need to do.
I am not aware of any API for pre-sizing dicts. Also the crash while resizing is just a symptom of a larger problem, it is not safe to concurrently access a dict and even if pre-sizing prevents crashes it will not prevent interpreters from concurrently modifying the dict and potentially creating invalid state. I suspect this is why you still see crashes in your tests where you use a shared interpreter.
The problem is occurring in the tp_sublasses dict is in the cpython code. It is marked as internal to cpython. Jep does not allocate, modify, or even access it. You would have to change the cpython code to use anything other than a dict.
Every sub-interpreter creates a hierarchy of java types that mirrors the java class hierarchy of any Java class used form python. Right now the java.lang.Object type is shared in all sub-interpreters. When sub-interpreters are creating their own types for subclasses of java.lang.Object everything breaks. If you could create a type hierarchy beforehand for every java class that will ever be used from python and ensure all of those types are immutable then that types could be shared between sub-interpreters and the problems should go away. I don't think that is a very practical solution for most use cases but it might be something you could get working if you happen to know every java class needed in python beforehand. |
@novos40 Since you have mentioned on other issues that you are using shared modules and also that compatibility with other python modules is important to you I also want to make sure you are aware that there is no plans to support shared modules in isolated sub-interpreters and while I can't speak authoritatively for the entire python ecosystem I suspect that a vast majority of python modules that include native code will not work with isolated sub-interpreters. I know that numpy is a popular extension module that has been hesitant to support sub-interpreters in the past so I was curious if they are moving to support isolated sub-interpreters and found this issue saying they do not currently support it and are not actively working to support it and also this post which shares my opinion that most extension modules do not work with isolated sub-interpreters. |
#594 has been merged into the dev_4.3 branch which resolves the problems with isolated sub-interpreters when calling |
Thanks! I'll definitely check it out. Please keep me posted about further developments. |
I did some testing with dev_4.3 branch:
Here are some other problems I did not mention before
|
I have not done any testing with isolated sub-interpreters on windows, so I am glad that is working. I have done all my testing on an ARM Mac and I am no longer encountering crashes on dev_4.3 after the recent changes. If you could provide a test case that is crashing for you then I can test it on my machine. It should not be necessary to preload code in a shared interpreter, do you see more crashes if you don't do that?
This looks like the idea presented in #589, I think it would take substantial effort to get jep working that way but would prefer to keep that discussion in #589
We have code to link libjep.jnilib to the .so that is generated during the build. I just tested it on my Mac using
In my experience setting PYTHONHOME when using a virtual environment never works. To use jep in a virtual environment you need to activate the environment before starting java by sourcing the activate script. Jep works fine for me on linux and Mac as long as the venv is active before starting java. |
Ah.. I guess I'm spoiled by java/osgi where you just drop a bundle jar into auto-launch folder and everything works. It never occurred to me that I need to do formal install to get libraries. I only did
OK, I see how it would work from command line. How do I properly activate python environment if I run java from eclipse or other IDE for that matter? Finally, since we're getting close to a working system, the next level question: How do I debug python code with jep? Graal provides google chrome and vscode debug hooks. I can't testify about vscode since I'm not using it, but chrome browser interface, while not stellar, is rather decent. It provides basic source code level breakpoints and inspections. Can I debug python part in eclipse somehow? I remember debugging python with print statements in 2001, but I don't think anybody would agree to this in 2025 :-) If you prefer to keep debugging discussion separate I can start another thread for that |
That is the only officially supported way of installing jep. I know many people have tried to package the python code and libraries in jars for easier distribution in Java but as far as I know none of them have made their work open source. You might be able to find more details on what others have done by searching through the jep issues.
You could probably activate the vent before starting the IDE and subprocesses should inherit the settings. I think the activate script just sets up some env vars so you could also try to set the same env vars in your run configuration. I don't really use venv and have never tried from an IDE so I can't offer much more help than that. In general using venv from an embedded interpreter is not as easy as it should be but I don't think it is within the scope of jep to fix that although we will try to expose new options as they become available in cpython.
I have never done that. Here is a link to a thread on our old mailing list which describes a process that has worked for some others. |
Describe the bug
I'm running latest python 3.13.1 on 8 core VM. I'm starting 8 java threads. Each thread creates it's own SubInterpreter and runs CPU-only function like
I've removed any functions call even functions like sum(). It seems like SubInterpreter creations and even function starts are concurrent, but overall it's still synchronized on something (despite the fact that python 3.12+ should support per-interpreter GIL). The output looks like this (threads started with consequent numbers so we can match start and stop)
which tells me that all functions do start concurrently, but then they wait for each other in some random order. In multiple runs the order of starts and stops can vary, but first thread finish is always only after last thread start. The CPU utilization is never exceeding 12% (except very short start period for code compilation, I guess) i.e. exactly like single thread execution. For that matter you can start python threads and get exactly the same performance or rather lack thereof.
What am doing wrong here? How do I make sub-interpreters execute concurrently?
To Reproduce
Expected behavior
Fully independent SubInterpreters should run concurrently allowing for 100% CPU utilization.
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: