Open

Description
Log shows that lpu_0 gpot data sent to lpu_1 (specifically, 'initV's from n_dict_0) contain newline characters "\n" in an array of floats. I also ran 2 LPUs on 1 GPU.
2017-01-31T18:16:04Z:INFO:man |connecting modules lpu_0 and lpu_1
2017-01-31T18:16:04Z:INFO:man |updating routing table with pattern
2017-01-31T18:16:04Z:INFO:man |connected modules lpu_0 and lpu_1
2017-01-31T18:16:05Z:INFO:man |sending steps message (10000)
2017-01-31T18:16:05Z:INFO:man |sending start message
2017-01-31T18:16:05Z:INFO:prc 1 |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:prc 0 |GPU 0 initialized
2017-01-31T18:16:05Z:INFO:mod lpu_0 |running code before body of worker 0
2017-01-31T18:16:05Z:INFO:mod lpu_0 |extracting output ports for lpu_1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |running code before body of worker 1
2017-01-31T18:16:05Z:INFO:mod lpu_1 |extracting input ports for lpu_0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running body of worker 0
2017-01-31T18:16:06Z:INFO:mod lpu_0 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_0 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_0 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running body of worker 1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |maximum number of steps changed: inf -> 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |setting maximum steps to 10000
2017-01-31T18:16:06Z:INFO:mod lpu_1 |starting
2017-01-31T18:16:06Z:INFO:mod lpu_1 |running execution step
2017-01-31T18:16:06Z:INFO:mod lpu_0 |gpot data sent to lpu_1: [-0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214 -0.05214\n -0.05214 -0.05214 -0.05214 -0.05214 -0.05214]
[archiso:15797] *** Process received signal ***
[archiso:15797] Signal: Segmentation fault (11)
[archiso:15797] Signal code: Invalid permissions (2)
[archiso:15797] Failing at address: 0xb016e0a00
[archiso:15797] [ 0] /usr/lib/libpthread.so.0(+0x11080)[0x7f2488111080]
[archiso:15797] [ 1] /usr/lib/libc.so.6(+0x128855)[0x7f2487e8a855]
[archiso:15797] [ 2] /usr/lib/openmpi/openmpi/mca_btl_vader.so(mca_btl_vader_sendi+0x186)[0x7f247adddba6]
[archiso:15797] [ 3] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(+0x80f6)[0x7f247a52a0f6]
[archiso:15797] [ 4] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x3fd)[0x7f247a52a95d]
[archiso:15797] [ 5] /usr/lib/openmpi/libmpi.so.12(MPI_Isend+0x2ba)[0x7f248459d28a]
[archiso:15797] [ 6] /usr/lib/python2.7/site-packages/mpi4py/MPI.so(+0xcc861)[0x7f24848e5861]
[archiso:15797] [ 7] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5f30)[0x7f2488407c60]
[archiso:15797] [ 8] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [ 9] /usr/lib/libpython2.7.so.1.0(+0x7329d)[0x7f248839029d]
[archiso:15797] [10] /usr/lib/libpython2.7.so.1.0(PyObject_Call+0x52)[0x7f2488369692]
[archiso:15797] [11] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3509)[0x7f2488405239]
[archiso:15797] [12] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [13] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5fd2)[0x7f2488407d02]
[archiso:15797] [14] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [15] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [16] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6108)[0x7f2488407e38]
[archiso:15797] [17] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8dc)[0x7f248840b8dc]
[archiso:15797] [18] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x28)[0x7f248840b9e8]
[archiso:15797] [19] /usr/lib/libpython2.7.so.1.0(+0x108efe)[0x7f2488425efe]
[archiso:15797] [20] /usr/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x81)[0x7f24884271c1]
[archiso:15797] [21] /usr/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xf4)[0x7f24884284e4]
[archiso:15797] [22] /usr/lib/libpython2.7.so.1.0(Py_Main+0xce0)[0x7f248843aca0]
[archiso:15797] [23] /usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f2487d82291]
[archiso:15797] [24] /usr/bin/python2(_start+0x2a)[0x55baed2517ea]
[archiso:15797] *** End of error message ***
2017-01-31T18:16:06Z:INFO:mod lpu_1 |sent all data from lpu_1
2017-01-31T18:16:06Z:INFO:mod lpu_1 |receiving from lpu_0
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 15797 on node archiso exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Metadata
Metadata
Assignees
Labels
No labels