Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intro example's intro_demo.py causes Segmentation fault #5

Open
ghost opened this issue Jan 31, 2017 · 1 comment
Open

Intro example's intro_demo.py causes Segmentation fault #5

ghost opened this issue Jan 31, 2017 · 1 comment

Comments

@ghost
Copy link

ghost commented Jan 31, 2017

Log shows that lpu_0 gpot data sent to lpu_1 (specifically, 'initV's from n_dict_0) contain newline characters "\n" in an array of floats. I also ran 2 LPUs on 1 GPU.

2017-01-31T18:16:04Z:INFO:man       |connecting modules lpu_0 and lpu_1
2017-01-31T18:16:04Z:INFO:man       |updating routing table with pattern
2017-01-31T18:16:04Z:INFO:man       |connected modules lpu_0 and lpu_1
2017-01-31T18:16:05Z:INFO:man       |sending steps message (10000)
2017-01-31T18:16:05Z:INFO:man       |sending start message
2017-01-31T18:16:05Z:INFO:prc 1     |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:prc 0     |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:mod lpu_0 |running code before body of worker 0
2017-01-31T18:16:05Z:INFO:mod lpu_0 |extracting output ports for lpu_1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |running code before body of worker 1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |extracting input ports for lpu_0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running body of worker 0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running body of worker 1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_0 |gpot data sent to lpu_1: [-0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214]
[archiso:15797] *** Process received signal ***
[archiso:15797] Signal: Segmentation fault (11)
[archiso:15797] Signal code: Invalid permissions (2)
[archiso:15797] Failing at address: 0xb016e0a00
[archiso:15797] [ 0] /usr/lib/libpthread.so.0(+0x11080)[0x7f2488111080]
[archiso:15797] [ 1] /usr/lib/libc.so.6(+0x128855)[0x7f2487e8a855]
[archiso:15797] [ 2] /usr/lib/openmpi/openmpi/mca_btl_vader.so(mca_btl_vader_sendi+0x186)[0x7f247adddba6]
[archiso:15797] [ 3] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(+0x80f6)[0x7f247a52a0f6]
[archiso:15797] [ 4] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x3fd)[0x7f247a52a95d]
[archiso:15797] [ 5] /usr/lib/openmpi/libmpi.so.12(MPI_Isend+0x2ba)[0x7f248459d28a]
[archiso:15797] [ 6] /usr/lib/python2.7/site-packages/mpi4py/MPI.so(+0xcc861)[0x7f24848e5861]
[archiso:15797] [ 7] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5f30)[0x7f2488407c60]
[archiso:15797] [ 8] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [ 9] /usr/lib/libpython2.7.so.1.0(+0x7329d)[0x7f248839029d]
[archiso:15797] [10] /usr/lib/libpython2.7.so.1.0(PyObject_Call+0x52)[0x7f2488369692]
[archiso:15797] [11] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3509)[0x7f2488405239]
[archiso:15797] [12] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [13] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5fd2)[0x7f2488407d02]
[archiso:15797] [14] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [15] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [16] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [17] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [18] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x28)[0x7f248840b9e8]
[archiso:15797] [19] /usr/lib/libpython2.7.so.1.0(+0x108efe)[0x7f2488425efe]
[archiso:15797] [20] /usr/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x81)[0x7f24884271c1]
[archiso:15797] [21] /usr/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xf4)[0x7f24884284e4]
[archiso:15797] [22] /usr/lib/libpython2.7.so.1.0(Py_Main+0xce0)[0x7f248843aca0]
[archiso:15797] [23] /usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f2487d82291]
[archiso:15797] [24] /usr/bin/python2(_start+0x2a)[0x55baed2517ea]
[archiso:15797] *** End of error message ***
2017-01-31T18:16:06Z:INFO:mod lpu_1 |sent all data from lpu_1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |receiving from lpu_0
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 15797 on node archiso exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
@nikulukani
Copy link
Member

This seems to be an issue with your OpenMPI installation. Can you try the hello_world example from OpenMPI(https://github.com/open-mpi/ompi/tree/master/examples) and see whether that results in a similar error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant