Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: when the result of d_to_python_numpy_ndarray is passed to Python, is the array also copied by value? i.e Python receive a copy? #165

Open
mw66 opened this issue Nov 9, 2022 · 9 comments

Comments

@mw66
Copy link
Contributor

mw66 commented Nov 9, 2022

According to the doc:

Numpy arrays implement the [buffer protocol](https://docs.python.org/3/c-api/buffer.html), which PyD can efficiently convert to D arrays.

To convert a D array to a numpy array, use pyd.extra.d_to_python_numpy_ndarray.

My question is: when the result of d_to_python_numpy_ndarray is passed to Python, is the array also copied by value? i.e Python receive a copy?

If yes, is there anyway to avoid the copy?

Thanks.

@ariovistus
Copy link
Owner

It has been quite a while since I have touched that code, but knee jerk answer is it copies by reference, and a quick close read supports that - it is copying pointers in a rather unreadable way to support arbitrary dimension numpy arrays

@mw66
Copy link
Contributor Author

mw66 commented Nov 10, 2022

My intention is that: since D code and Python code run in the same memory space, is it possible for them to directly share the same raw pointer to the beginning of an array (let's only consider 1D array of float type) without the need to copy array contents?

@ariovistus
Copy link
Owner

yes

@mw66
Copy link
Contributor Author

mw66 commented Nov 10, 2022

So, how? can you show a small example 😀 ?

I tried to pass a D array using d_to_python_numpy_ndarray, and changed the contents on the Python side, but when I print the D side array, it's still the same.

@ariovistus
Copy link
Owner

uh oh, maybe I'm a liar.. let me try

@ariovistus
Copy link
Owner

ya, its doing a value copy. dang.
So I'm guessing the reason it is that way is because numpy doesn't have a way to be fed a pointer (or I couldn't find one). If you're ok with numpy doing the memory allocations, it is possible for d code to get numpy pointers:

    import std.stdio;
    auto context = new InterpContext();
    context.py_stmts("
        import numpy
        np_array = numpy.ones((10,), dtype='int32')
        print(np_array)
    ");
    PydObject pyd_array = context.np_array;
    void* raw_ptr = pyd_array.buffer_view().item_ptr([0]);
    int* d_ptr = cast(int*) raw_ptr;
    int[] d_array = d_ptr[0 .. 10];
    writeln("d array: ", d_array);
    d_array[5] = 400;
    writeln("d array: ", d_array);

    context.py_stmts("
        print(np_array)
    ");

@mw66
Copy link
Contributor Author

mw66 commented Nov 10, 2022

Thanks for the example. Yes, in this way it can share the raw pointer on both sides.

@mw66 mw66 closed this as completed Nov 10, 2022
@mw66
Copy link
Contributor Author

mw66 commented Nov 12, 2022

I'm not sure if this is related, but I'm experiencing a problem: a Python func call hangs somewhere in a multi threaded program

(gdb) where                                                                                                                             
#0  0x00007ffff3e08fb9 in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7fffba7ec910, expected=0, futex_word=0x7ffff57d3da8 <_PyRuntime+424>) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
#1  __pthread_cond_wait_common (abstime=0x7fffba7ec9f0, mutex=0x7ffff57d3db0 <_PyRuntime+432>, cond=0x7ffff57d3d80 <_PyRuntime+384>) at pthread_cond_wait.c:533
#2  __pthread_cond_timedwait (cond=0x7ffff57d3d80 <_PyRuntime+384>, mutex=0x7ffff57d3db0 <_PyRuntime+432>, abstime=0x7fffba7ec9f0) at pthread_cond_wait.c:667
#3  0x00007ffff556deed in PyCOND_TIMEDWAIT (us=<optimized out>, mut=<optimized out>, cond=0x7ffff57d3d80 <_PyRuntime+384>) at /tmp/build/80754af9/python-split_1607696593712/work/Python/condvar.h:73
#4  take_gil () at /tmp/build/80754af9/python-split_1607696593712/work/Python/ceval_gil.h:247
#5  0x00007ffff556e042 in PyEval_RestoreThread () at /tmp/build/80754af9/python-split_1607696593712/work/Python/ceval.c:467
#6  0x00007ffff5644421 in PyGILState_Ensure () at /tmp/build/80754af9/python-split_1607696593712/work/Python/pystate.c:1378
#7  0x00007fff63cb139d in tensorflow::(anonymous namespace)::StackTraceWrapper::~StackTraceWrapper() ()

This looks like a Python GIL locking issue.

So before I check other things, I want to ask the following question:

does this approach work in a multi threaded situation:
#165 (comment)

Specifically: suppose in the D's main thread, at the init() stage, all these d_array is allocated and wrote some data into it; and later on, in one of the worker thread calls a Python func (via pyd) which access the afore-allocated allocated memory np_array on the Python side, will this causing some Python GIL locking issue?

If yes, is there any pyd func call to release the lock (on d_array, pyd_array, ... all the way upto context.np_array) in the main thread, and let the worker thread go thru?

I found these two functions, but did not see how they are used:

$ grep -Iir release .dub/packages/pyd-0.14.3/ | grep -i lock
.dub/packages/pyd-0.14.3/pyd/infrastructure/deimos/python/ceval.d:void PyEval_ReleaseLock();
.dub/packages/pyd-0.14.3/pyd/infrastructure/deimos/python/pythread.d:void PyThread_release_lock(PyThread_type_lock);

Thanks.

@mw66 mw66 reopened this Nov 12, 2022
@mw66
Copy link
Contributor Author

mw66 commented Nov 15, 2022

Just an update: play with Python GIL is no fun. I think the best practice to use PyD / Python is to keep it running in the same single D thread, which makes things easier. Even with this setting, I sometimes run into problems with dynamically loaded libraries by Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants