You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Saving a netgen-generated mesh to a checkpoint deadlocks in parallel.
Steps to Reproduce
Steps to reproduce the behavior:
Consider the following code:
# Run with mpiexec -n 2 ...
from firedrake import *
from netgen.geom2d import SplineGeometry
use_netgen = True
if use_netgen:
geo = SplineGeometry()
geo.AddRectangle((0, 0), (1, 1))
ngmesh = geo.GenerateMesh(maxh=1.0)
mesh = Mesh(ngmesh)
else:
mesh = UnitSquareMesh(1, 1)
with CheckpointFile("temp.h5", "w") as f:
print("saving...", flush=True)
f.save_mesh(mesh)
print("done", flush=True)
# hangs if use_netgen
Run mpiexec -n 2 python demo.py
Expected behavior
I expected the code to terminate successfully.
Error message
Here are the backtraces for the two processes.
#0 MPID_nem_queue_empty (qhead=0x141412fff180)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/petsc/linux-gnu-c-opt/externalpackages/mpich-4.2.2/src/mpl/include/mpl_atomic_c11.h:104
#1 MPID_nem_mpich_blocking_recv (completions=<optimised out>, in_fbox=<synthetic pointer>, cell=<synthetic pointer>) at ./src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:936
#2 MPIDI_CH3I_Progress (progress_state=progress_state@entry=0x7fff170644c4, is_blocking=is_blocking@entry=1) at src/mpid/ch3/channels/nemesis/src/ch3_progress.c:354
#3 0x000014141b43c847 in MPIR_Wait_state (request_ptr=request_ptr@entry=0x14141b6cf4a0 <MPIR_Request_direct>, status=status@entry=0x1, state=state@entry=0x7fff170644c4) at src/mpi/request/request_impl.c:736
#4 0x000014141b43c9cb in MPIR_Wait_impl (status=0x1, request_ptr=0x14141b6cf4a0 <MPIR_Request_direct>) at src/mpi/request/request_impl.c:760
#5 MPID_Wait (status=0x1, request_ptr=0x14141b6cf4a0 <MPIR_Request_direct>) at ./src/mpid/ch3/include/mpidpost.h:267
#6 MPIR_Wait (request_ptr=request_ptr@entry=0x14141b6cf4a0 <MPIR_Request_direct>, status=status@entry=0x1) at src/mpi/request/request_impl.c:779
#7 0x000014141b3f8c6b in MPIC_Wait (request_ptr=0x14141b6cf4a0 <MPIR_Request_direct>) at src/mpi/coll/helper_fns.c:90
#8 0x000014141b3f963e in MPIC_Sendrecv (sendbuf=sendbuf@entry=0x0, sendcount=sendcount@entry=0, sendtype=sendtype@entry=1275068685, dest=dest@entry=1, sendtag=sendtag@entry=1, recvbuf=recvbuf@entry=0x0,
recvcount=0, recvtype=1275068685, source=1, recvtag=1, comm_ptr=0x589aeb176908, status=0x7fff17064560, errflag=MPIR_ERR_NONE) at src/mpi/coll/helper_fns.c:307
#9 0x000014141b37506f in MPIR_Barrier_intra_dissemination (comm_ptr=0x589aeb176908, errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/barrier/barrier_intra_k_dissemination.c:30
#10 0x000014141b37558c in MPIR_Barrier_intra_k_dissemination (comm=comm@entry=0x589aeb176908, k=<optimised out>, errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/barrier/barrier_intra_k_dissemination.c:63
#11 0x000014141b3db08e in MPIR_Barrier_allcomm_auto (comm_ptr=comm_ptr@entry=0x589aeb176908, errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/mpir_coll.c:27
#12 0x000014141b3db18b in MPIR_Barrier_impl (comm_ptr=0x589aeb176908, errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/mpir_coll.c:85
#13 0x000014141b3db369 in MPID_Barrier (errflag=MPIR_ERR_NONE, comm=<optimised out>) at ./src/mpid/ch3/include/mpid_coll.h:20
#14 0x000014141b3756d5 in MPIR_Barrier_intra_smp (comm_ptr=comm_ptr@entry=0x589aeb176430, errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/barrier/barrier_intra_smp.c:17
#15 0x000014141b3db07b in MPIR_Barrier_allcomm_auto (comm_ptr=comm_ptr@entry=0x589aeb176430, errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/mpir_coll.c:39
#16 0x000014141b3db18b in MPIR_Barrier_impl (comm_ptr=comm_ptr@entry=0x589aeb176430, errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/mpir_coll.c:85
#17 0x000014141b3db369 in MPID_Barrier (errflag=MPIR_ERR_NONE, comm=0x589aeb176430) at ./src/mpid/ch3/include/mpid_coll.h:20
#18 0x000014141b25f123 in internal_Barrier (comm=-1006632938) at src/binding/c/c_binding.c:7439
#19 PMPI_Barrier (comm=-1006632938) at src/binding/c/c_binding.c:7487
#20 0x00001414193073ac in H5AC__rsp__dist_md_write__flush (f=0x589aeb0ee390) at H5ACmpio.c:1702
#21 H5AC__run_sync_point (sync_point_op=<optimised out>, f=0x589aeb0ee390) at H5ACmpio.c:2164
#22 H5AC__run_sync_point (f=0x589aeb0ee390, sync_point_op=<optimised out>) at H5ACmpio.c:2099
#23 0x000014141930855f in H5AC__flush_entries (f=f@entry=0x589aeb0ee390) at H5ACmpio.c:2307
#24 0x000014141906fee8 in H5AC_dest (f=f@entry=0x589aeb0ee390) at H5AC.c:527
#25 0x000014141910d9b0 in H5F__dest (f=f@entry=0x589aeb0ee390, flush=flush@entry=true) at H5Fint.c:1275
#26 0x000014141910e7c3 in H5F_try_close (f=0x589aeb0ee390, was_closed=was_closed@entry=0x0) at H5Fint.c:2180
#27 0x000014141910eafc in H5F__close_cb (f=<optimised out>) at H5Fint.c:2009
#28 0x0000141419186d68 in H5I_dec_ref (id=72057594037927936) at H5I.c:1254
#29 H5I_dec_ref (id=72057594037927936) at H5I.c:1219
#30 0x0000141419186f14 in H5I_dec_app_ref (id=id@entry=72057594037927936) at H5I.c:1299
#31 0x000014141910e532 in H5F__close (file_id=file_id@entry=72057594037927936) at H5Fint.c:1951
#32 0x0000141419103cf2 in H5Fclose (file_id=72057594037927936) at H5F.c:674
#33 0x000014141b9f9d11 in PetscViewerFileClose_HDF5 (viewer=0x589aeb031e80) at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/petsc/src/sys/classes/viewer/impls/hdf5/hdf5v.c:107
#34 0x000014141b9fa06c in PetscViewerDestroy_HDF5 (viewer=0x589aeb031e80) at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/petsc/src/sys/classes/viewer/impls/hdf5/hdf5v.c:126
#35 0x000014141ba0a784 in PetscViewerDestroy (viewer=0x1413f79f4078) at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/petsc/src/sys/classes/viewer/interface/view.c:101
#36 0x000014141df1d664 in __pyx_pf_8petsc4py_5PETSc_6Viewer_6destroy (__pyx_v_self=0x1413f79f4040) at src/petsc4py/PETSc.c:124417
#37 __pyx_pw_8petsc4py_5PETSc_6Viewer_7destroy (__pyx_v_self=0x1413f79f4040, __pyx_args=<optimised out>, __pyx_nargs=<optimised out>, __pyx_kwds=<optimised out>) at src/petsc4py/PETSc.c:58858
#38 0x0000589ae688355e in ?? ()
#39 0x0000589ae684b45c in _PyEval_EvalFrameDefault ()
#40 0x0000589ae68629fc in _PyFunction_Vectorcall ()
#41 0x0000589ae684b45c in _PyEval_EvalFrameDefault ()
#42 0x0000589ae68707f1 in ?? ()
#43 0x0000589ae684b26d in _PyEval_EvalFrameDefault ()
#44 0x0000589ae68479c6 in ?? ()
#45 0x0000589ae693d256 in PyEval_EvalCode ()
#46 0x0000589ae6968108 in ?? ()
#47 0x0000589ae69619cb in ?? ()
#48 0x0000589ae6967e55 in ?? ()
#49 0x0000589ae6967338 in _PyRun_SimpleFileObject ()
#50 0x0000589ae6966f83 in _PyRun_AnyFileObject ()
#51 0x0000589ae6959a5e in Py_RunMain ()
#52 0x0000589ae693002d in Py_BytesMain ()
#53 0x000014141ec29d90 in __libc_start_call_main (main=main@entry=0x589ae692fff0, argc=argc@entry=2, argv=argv@entry=0x7fff17065978) at ../sysdeps/nptl/libc_start_call_main.h:58
#54 0x000014141ec29e40 in __libc_start_main_impl (main=0x589ae692fff0, argc=2, argv=0x7fff17065978, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fff17065968) at ../csu/libc-start.c:392
#55 0x0000589ae692ff25 in _start ()
and
#0 MPID_nem_queue_empty (qhead=0xa6ce0fff200)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/petsc/linux-gnu-c-opt/externalpackages/mpich-4.2.2/src/mpl/include/mpl_atomic_c11.h:104
#1 MPID_nem_mpich_blocking_recv (completions=<optimised out>, in_fbox=<synthetic pointer>, cell=<synthetic pointer>) at ./src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:936
#2 MPIDI_CH3I_Progress (progress_state=progress_state@entry=0x7ffd78cdc0d4, is_blocking=is_blocking@entry=1) at src/mpid/ch3/channels/nemesis/src/ch3_progress.c:354
#3 0x00000a6ce963c847 in MPIR_Wait_state (request_ptr=request_ptr@entry=0xa6ce98cf4a0 <MPIR_Request_direct>, status=status@entry=0x1, state=state@entry=0x7ffd78cdc0d4) at src/mpi/request/request_impl.c:736
#4 0x00000a6ce963c9cb in MPIR_Wait_impl (status=0x1, request_ptr=0xa6ce98cf4a0 <MPIR_Request_direct>) at src/mpi/request/request_impl.c:760
#5 MPID_Wait (status=0x1, request_ptr=0xa6ce98cf4a0 <MPIR_Request_direct>) at ./src/mpid/ch3/include/mpidpost.h:267
#6 MPIR_Wait (request_ptr=request_ptr@entry=0xa6ce98cf4a0 <MPIR_Request_direct>, status=status@entry=0x1) at src/mpi/request/request_impl.c:779
#7 0x00000a6ce95f8c6b in MPIC_Wait (request_ptr=0xa6ce98cf4a0 <MPIR_Request_direct>) at src/mpi/coll/helper_fns.c:90
#8 0x00000a6ce95f90f1 in MPIC_Recv (buf=buf@entry=0x7ffd78cdc3a8, count=count@entry=8, datatype=datatype@entry=1275068685, source=<optimised out>, tag=tag@entry=2, comm_ptr=comm_ptr@entry=0x631691c26a20,
status=0x7ffd78cdc200) at src/mpi/coll/helper_fns.c:198
#9 0x00000a6ce95762c9 in MPIR_Bcast_intra_binomial (buffer=buffer@entry=0x7ffd78cdc3a8, count=count@entry=8, datatype=datatype@entry=1275068685, root=root@entry=0, comm_ptr=comm_ptr@entry=0x631691c26a20,
errflag=MPIR_ERR_NONE) at src/mpi/coll/bcast/bcast_intra_binomial.c:97
#10 0x00000a6ce95dbf2c in MPIR_Bcast_allcomm_auto (buffer=buffer@entry=0x7ffd78cdc3a8, count=count@entry=8, datatype=datatype@entry=1275068685, root=root@entry=0, comm_ptr=0x631691c26a20, errflag=MPIR_ERR_NONE)
at src/mpi/coll/mpir_coll.c:324
#11 0x00000a6ce95dc061 in MPIR_Bcast_impl (buffer=buffer@entry=0x7ffd78cdc3a8, count=count@entry=8, datatype=datatype@entry=1275068685, root=root@entry=0, comm_ptr=comm_ptr@entry=0x631691c26a20,
errflag=errflag@entry=MPIR_ERR_NONE) at src/mpi/coll/mpir_coll.c:421
#12 0x00000a6ce95dc2f9 in MPID_Bcast (errflag=MPIR_ERR_NONE, comm=0x631691c26a20, root=0, datatype=1275068685, count=8, buffer=0x7ffd78cdc3a8) at ./src/mpid/ch3/include/mpid_coll.h:30
#13 0x00000a6ce945ff6f in internal_Bcast (comm=-1006632948, root=0, datatype=1275068685, count=8, buffer=<optimised out>) at src/binding/c/c_binding.c:7708
#14 PMPI_Bcast (buffer=buffer@entry=0x7ffd78cdc3a8, count=count@entry=8, datatype=datatype@entry=1275068685, root=root@entry=0, comm=-1006632948) at src/binding/c/c_binding.c:7759
#15 0x00000a6ce7314bef in H5FD_mpio_truncate (dxpl_id=<optimised out>, closing=128, _file=0x631691814f80) at H5FDmpio.c:2023
#16 H5FD_mpio_truncate (_file=_file@entry=0x631691814f80, dxpl_id=<optimised out>, closing=closing@entry=false) at H5FDmpio.c:1979
#17 0x00000a6ce7124b61 in H5FD_truncate (file=0x631691814f80, closing=closing@entry=false) at H5FD.c:1580
#18 0x00000a6ce710bd5c in H5F__flush_phase2 (f=f@entry=0x631691c10e00, closing=closing@entry=false) at H5Fint.c:1846
#19 0x00000a6ce710e37a in H5F__flush_phase2 (closing=false, f=0x631691c10e00) at H5Fint.c:1825
#20 H5F__flush (f=f@entry=0x631691c10e00) at H5Fint.c:1904
#21 0x00000a6ce7103a84 in H5Fflush (object_id=object_id@entry=72057594037927936, scope=scope@entry=H5F_SCOPE_LOCAL) at H5F.c:638
#22 0x00000a6cce378a9c in __pyx_f_4h5py_4defs_H5Fflush (__pyx_v_object_id=72057594037927936, __pyx_v_scope=H5F_SCOPE_LOCAL)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/h5py/h5py/defs.c:14175
#23 0x00000a6ccba7f202 in __pyx_pf_4h5py_3h5f_6flush (__pyx_v_obj=0xa6cc56348b0, __pyx_v_obj=0xa6cc56348b0, __pyx_self=<optimised out>, __pyx_v_scope=0)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/h5py/h5py/h5f.c:7587
#24 __pyx_pw_4h5py_3h5f_7flush (__pyx_self=<optimised out>, __pyx_args=<optimised out>, __pyx_nargs=1, __pyx_kwds=<optimised out>)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/h5py/h5py/h5f.c:7554
#25 0x00000a6ccd7ac8c1 in __Pyx_PyObject_Call (kw=0xa6cc57decc0, arg=0xa6cc5609840, func=0xa6cca6b4a00)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/h5py/h5py/_objects.c:14294
#26 __pyx_pf_4h5py_8_objects_9with_phil_wrapper (__pyx_v_kwds=0xa6cc56277c0, __pyx_v_args=0xa6cc5609840, __pyx_self=<optimised out>)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/h5py/h5py/_objects.c:6419
#27 __pyx_pw_4h5py_8_objects_9with_phil_1wrapper (__pyx_self=<optimised out>, __pyx_args=0xa6cc5609840, __pyx_kwds=<optimised out>)
at /home/farrellp/git/install-scripts/firedrake/firedrake-dev-20240828-mpich/src/h5py/h5py/_objects.c:6330
#28 0x000063168db13a7b in _PyObject_MakeTpCall ()
#29 0x000063168db0c629 in _PyEval_EvalFrameDefault ()
#30 0x000063168db1d9fc in _PyFunction_Vectorcall ()
#31 0x000063168db0645c in _PyEval_EvalFrameDefault ()
#32 0x000063168db1d9fc in _PyFunction_Vectorcall ()
#33 0x000063168db0645c in _PyEval_EvalFrameDefault ()
#34 0x000063168db2b7f1 in ?? ()
#35 0x000063168db0626d in _PyEval_EvalFrameDefault ()
#36 0x000063168db029c6 in ?? ()
#37 0x000063168dbf8256 in PyEval_EvalCode ()
#38 0x000063168dc23108 in ?? ()
#39 0x000063168dc1c9cb in ?? ()
#40 0x000063168dc22e55 in ?? ()
#41 0x000063168dc22338 in _PyRun_SimpleFileObject ()
#42 0x000063168dc21f83 in _PyRun_AnyFileObject ()
#43 0x000063168dc14a5e in Py_RunMain ()
#44 0x000063168dbeb02d in Py_BytesMain ()
#45 0x00000a6cece29d90 in __libc_start_call_main (main=main@entry=0x63168dbeaff0, argc=argc@entry=2, argv=argv@entry=0x7ffd78cdd278) at ../sysdeps/nptl/libc_start_call_main.h:58
#46 0x00000a6cece29e40 in __libc_start_main_impl (main=0x63168dbeaff0, argc=2, argv=0x7ffd78cdd278, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7ffd78cdd268) at ../csu/libc-start.c:392
#47 0x000063168dbeaf25 in _start ()
Describe the bug
Saving a netgen-generated mesh to a checkpoint deadlocks in parallel.
Steps to Reproduce
Steps to reproduce the behavior:
mpiexec -n 2 python demo.py
Expected behavior
I expected the code to terminate successfully.
Error message
Here are the backtraces for the two processes.
and
Environment:
firedrake-status
Additional Info
N/A
The text was updated successfully, but these errors were encountered: