Skip to content

Intro example's intro_demo.py causes Segmentation fault #5

Open
@ghost

Description

Log shows that lpu_0 gpot data sent to lpu_1 (specifically, 'initV's from n_dict_0) contain newline characters "\n" in an array of floats. I also ran 2 LPUs on 1 GPU.

2017-01-31T18:16:04Z:INFO:man       |connecting modules lpu_0 and lpu_1
2017-01-31T18:16:04Z:INFO:man       |updating routing table with pattern
2017-01-31T18:16:04Z:INFO:man       |connected modules lpu_0 and lpu_1
2017-01-31T18:16:05Z:INFO:man       |sending steps message (10000)
2017-01-31T18:16:05Z:INFO:man       |sending start message
2017-01-31T18:16:05Z:INFO:prc 1     |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:prc 0     |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:mod lpu_0 |running code before body of worker 0
2017-01-31T18:16:05Z:INFO:mod lpu_0 |extracting output ports for lpu_1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |running code before body of worker 1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |extracting input ports for lpu_0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running body of worker 0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running body of worker 1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_0 |gpot data sent to lpu_1: [-0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214]
[archiso:15797] *** Process received signal ***
[archiso:15797] Signal: Segmentation fault (11)
[archiso:15797] Signal code: Invalid permissions (2)
[archiso:15797] Failing at address: 0xb016e0a00
[archiso:15797] [ 0] /usr/lib/libpthread.so.0(+0x11080)[0x7f2488111080]
[archiso:15797] [ 1] /usr/lib/libc.so.6(+0x128855)[0x7f2487e8a855]
[archiso:15797] [ 2] /usr/lib/openmpi/openmpi/mca_btl_vader.so(mca_btl_vader_sendi+0x186)[0x7f247adddba6]
[archiso:15797] [ 3] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(+0x80f6)[0x7f247a52a0f6]
[archiso:15797] [ 4] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x3fd)[0x7f247a52a95d]
[archiso:15797] [ 5] /usr/lib/openmpi/libmpi.so.12(MPI_Isend+0x2ba)[0x7f248459d28a]
[archiso:15797] [ 6] /usr/lib/python2.7/site-packages/mpi4py/MPI.so(+0xcc861)[0x7f24848e5861]
[archiso:15797] [ 7] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5f30)[0x7f2488407c60]
[archiso:15797] [ 8] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [ 9] /usr/lib/libpython2.7.so.1.0(+0x7329d)[0x7f248839029d]
[archiso:15797] [10] /usr/lib/libpython2.7.so.1.0(PyObject_Call+0x52)[0x7f2488369692]
[archiso:15797] [11] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3509)[0x7f2488405239]
[archiso:15797] [12] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [13] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5fd2)[0x7f2488407d02]
[archiso:15797] [14] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [15] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [16] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [17] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [18] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x28)[0x7f248840b9e8]
[archiso:15797] [19] /usr/lib/libpython2.7.so.1.0(+0x108efe)[0x7f2488425efe]
[archiso:15797] [20] /usr/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x81)[0x7f24884271c1]
[archiso:15797] [21] /usr/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xf4)[0x7f24884284e4]
[archiso:15797] [22] /usr/lib/libpython2.7.so.1.0(Py_Main+0xce0)[0x7f248843aca0]
[archiso:15797] [23] /usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f2487d82291]
[archiso:15797] [24] /usr/bin/python2(_start+0x2a)[0x55baed2517ea]
[archiso:15797] *** End of error message ***
2017-01-31T18:16:06Z:INFO:mod lpu_1 |sent all data from lpu_1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |receiving from lpu_0
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 15797 on node archiso exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions