Skip to content

Segmentation Fault #1090

Open
Open
@axsk

Description

@axsk

I keep running into Segmentation fault errors.

This happens most probably during calls to the OpenMM Python API.
Here is my "error message"

Stacktrace
[13736] signal 11 (1): Segmentation fault                                                                                                                                                                                                                  
in expression starting at none:0                                                                                                                                                                                                                           
_PyInterpreterState_GET at /usr/local/src/conda/python-3.12.8/Include/internal/pycore_pystate.h:133 [inlined]                                                                                                                                              
get_state at /usr/local/src/conda/python-3.12.8/Objects/obmalloc.c:866 [inlined]                                                                                                                                                                           
_PyObject_Free at /usr/local/src/conda/python-3.12.8/Objects/obmalloc.c:1850 [inlined]
PyObject_Free at /usr/local/src/conda/python-3.12.8/Objects/obmalloc.c:830
_buffer_info_free at /data/numerik/people/bzfsikor/conda/envs/conda_jl/lib/python3.12/site-packages/numpy/core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so (unknown line)
array_dealloc at /data/numerik/people/bzfsikor/conda/envs/conda_jl/lib/python3.12/site-packages/numpy/core/_multiarray_umath.cpython-312-x86_64-linux-gnu.so (unknown line)
pydecref_ at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4550 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/PyCall/GkzkC_ddiUX.so (unknown line) 
run_finalizer at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:435
enable_finalizers at ./gcutils.jl:161 [inlined]
unlock at ./lock.jl:178 [inlined]
macro expansion at ./lock.jl:275 [inlined]
#282 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2851
jfptr_YY.282_9263 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 775171170 (Pool: 775164164; Big: 7006); GC: 5349
fish: Job 1, 'env JULIA_HISTORY=./.history.jl…' terminated by signal SIGSEGV (Address boundary error)

I have no idea how to investigate this further.

Activity

changed the title [-]SegmentationFault[/-] [+]Segmentation Fault[/+] on Jan 24, 2025
axsk

axsk commented on Jan 24, 2025

@axsk
Author

And here is another one, which I run into more often. This puzzles me especially since it somehow involves CUDA as well..

Stacktrace
[89762] signal 11 (1): Segmentation fault
in expression starting at REPL[98]:1
_PyInterpreterState_GET at /usr/local/src/conda/python-3.12.8/Include/internal/pycore_pystate.h:133 [inlined]
get_gc_state at /usr/local/src/conda/python-3.12.8/Modules/gcmodule.c:134 [inlined]
PyObject_GC_Del at /usr/local/src/conda/python-3.12.8/Modules/gcmodule.c:2421
pydecref_ at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4550 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/PyCall/GkzkC_ddiUX.so (unknown line)
run_finalizer at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/gc.c:435
enable_finalizers at ./gcutils.jl:161 [inlined]
unlock at ./locks-mt.jl:68 [inlined]
popfirst! at ./task.jl:751
trypoptask at ./task.jl:992
jfptr_trypoptask_66779.1 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
get_next_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/scheduler.c:377 [inlined]
ijl_task_get_next at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/scheduler.c:438
poptask at ./task.jl:1012
wait at ./task.jl:1021
#wait#731 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /data/numerik/people/bzfsikor/software/julia_depot/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:53
synchronization_worker at /data/numerik/people/bzfsikor/software/julia_depot/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:119
unknown function (ip: 0x7f8cd97c33b5)
jlcapi_synchronization_worker_13623 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/CUDA/oWw5k_ddiUX.so (unknown line)
unknown function (ip: 0x7f8f3d3961c3)
unknown function (ip: 0x7f8f3d41685b)
Allocations: 1455439833 (Pool: 1455431030; Big: 8803); GC: 4054
fish: Job 1, 'env JULIA_HISTORY=./.history.jl…' terminated by signal SIGSEGV (Address boundary error)
axsk

axsk commented on Jan 24, 2025

@axsk
Author

I switched to a single threaded instance and have not yet observed this issue.
However, I don't make any (explicit) use of multi-threading nowhere in my code..

axsk

axsk commented on Mar 4, 2025

@axsk
Author

Whereas above examples happened randomly in my training loop, I can now reproduce the problem by tab-completing a PyObjets fields:

Stacktrace
julia> a[1]                                                                                                                                                                                                                                                             
PyObject <Atom 0 (H1) of chain 0 residue 0 (ACE)>                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                        
julia> a[1].                                                                                                                                                                                                                                                            
[12951] signal 11 (1): Segmentation fault                                                                                                                                                                                                                               
in expression starting at none:0                                                                                                                                                                                                                                        
        _PyInterpreterState_GET at /usr/local/src/conda/python-3.12.8/Include/internal/pycore_pystate.h:133 [inlined]                                                                                                                                                   
_PyType_Lookup at /usr/local/src/conda/python-3.12.8/Objects/typeobject.c:4729 [inlined]                                                                                                                                                                                
_PyObject_LookupSpecial at /usr/local/src/conda/python-3.12.8/Objects/typeobject.c:2167                                                                                                                                                                                 
_dir_object at /usr/local/src/conda/python-3.12.8/Objects/object.c:1758 [inlined]                                                                                                                                                                                       
PyObject_Dir at /usr/local/src/conda/python-3.12.8/Objects/object.c:1790                                                                                                                                                                                                
macro expansion at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/exception.jl:108 [inlined]                                                                                                                                              
propertynames at /data/numerik/people/bzfsikor/software/julia_depot/packages/PyCall/1gn3u/src/PyCall.jl:327                                                                                                                                                             
propertynames at ./reflection.jl:2612                                                                                                                                                                                                                                   
unknown function (ip: 0x7f671ed24e4b)                                                                                                                                                                                                                                   
complete_symbol at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:208                                                                                                                              
#complete_identifiers!#57 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:1179                                                                                                                   
complete_identifiers! at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:1079 [inlined]                                                                                                             
completions at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPLCompletions.jl:1436                                                                                                                                 
#complete_line#85 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:637                                                                                                                                       
complete_line at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:634                                                                                                                                           
unknown function (ip: 0x7f68d8d8d97d)                                                                                                                                                                                                                                   
check_for_hint at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:387                                                                                                                                      
#143 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2527                                                                                                                                               
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]                                                                                                                                                                 
jl_f__call_latest at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/builtins.c:875                                                                                                                                                                
#invokelatest#2 at ./essentials.jl:1055 [inlined]                                                                                                                                                                                                                       
invokelatest at ./essentials.jl:1052 [inlined]                                                                                                                                                                                                                          
#30 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:1711                                                                                                                                                
jfptr_YY.30_8684 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
#254 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2614
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
#30 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:1711
jfptr_YY.30_8724 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
macro expansion at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2861 [inlined]
macro expansion at ./lock.jl:273 [inlined]
#282 at /home/htc/bzfsikor/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2851
jfptr_YY.282_9263 at /data/numerik/people/bzfsikor/software/julia_depot/compiled/v1.11/REPL/u0gqU_dovaC.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 66606445 (Pool: 66604355; Big: 2090); GC: 104
fish: Job 1, 'env JULIA_HISTORY=./.history.jl…' terminated by signal SIGSEGV (Address boundary error)

Again, this only happens with multiple threads, when starting Julia with --threads=1 it works fine.

bhawkins

bhawkins commented on Apr 8, 2025

@bhawkins

Here's a simple script that reliably triggers a similar segfault for me:

using PyCall
const math = pyimport("math")

println("nthreads = ", Threads.nthreads())

function foo(n, niter=1)
    x = zeros(n)
    for iter = 1:niter
        Threads.@threads for i = 1:n
            x[i] += rand()
        end
    end
    return x
end

foo(50_000, 25_000)
Output on Linux with Julia 1.11.4 and Python 3.12.9
$ julia -t 8 pycall_segfault.jl 
nthreads = 8

[4600] signal 11 (1): Segmentation fault
in expression starting at /data/bhawkins/PRH-4/NISAR_L0_PR_RRSD_055_071_D_137S_20241015T075909_20241015T075935_P00406_F_J_001/pycall_segfault.jl:16
_PyInterpreterState_GET at /usr/local/src/conda/python-3.12.9/Include/internal/pycore_pystate.h:133 [inlined]
notify_code_watchers at /usr/local/src/conda/python-3.12.9/Objects/codeobject.c:32 [inlined]
code_dealloc at /usr/local/src/conda/python-3.12.9/Objects/codeobject.c:1705
pydecref_ at /home/jovyan/.julia/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /home/jovyan/.julia/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4441 at /home/jovyan/.julia/compiled/v1.11/PyCall/GkzkC_FmVRe.so (unknown line)
run_finalizer at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gc.c:435
jl_mutex_unlock at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia_locks.h:80 [inlined]
ijl_process_events at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jl_uv.c:398
ijl_task_get_next at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/scheduler.c:610
poptask at ./task.jl:1012
wait at ./task.jl:1021
task_done_hook at ./task.jl:694
jfptr_task_done_hook_66658.1 at /home/jovyan/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_finish_task at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/task.c:319
start_task at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/task.c:1213
Allocations: 664847 (Pool: 664810; Big: 37); GC: 1
Segmentation fault (core dumped)
Output on macos with Julia 1.11.3 and Python 3.13.2
$ julia -t 8 pycall_segfault.jl
nthreads = 8

[60363] signal 11 (2): Segmentation fault: 11
in expression starting at /Users/bhawkins/Downloads/pycall_segfault.jl:16
code_dealloc at /opt/homebrew/Cellar/python@3.13/3.13.2/Frameworks/Python.framework/Versions/3.13/Python (unknown line)
pydecref_ at /Users/bhawkins/.julia/packages/PyCall/1gn3u/src/PyCall.jl:118
pydecref at /Users/bhawkins/.julia/packages/PyCall/1gn3u/src/PyCall.jl:123
jfptr_pydecref_4460 at /Users/bhawkins/.julia/compiled/v1.11/PyCall/GkzkC_UERWi.dylib (unknown line)
run_finalizer at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/gc.c:299
jl_gc_run_finalizers_in_list at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/gc.c:389
run_finalizers at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/gc.c:435
jl_mutex_unlock at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/./julia_locks.h:80 [inlined]
ijl_task_get_next at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/scheduler.c:526
poptask at ./task.jl:1012
wait at ./task.jl:1021
task_done_hook at ./task.jl:694
jfptr_task_done_hook_66909.1 at /Users/bhawkins/.julia/juliaup/julia-1.11.3+0.aarch64.apple.darwin14/lib/julia/sys.dylib (unknown line)
jl_apply at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/./julia.h:2157 [inlined]
jl_finish_task at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/task.c:319
start_task at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XC9YQX9HH2.0/build/default-honeycrisp-XC9YQX9HH2-0/julialang/julia-release-1-dot-11/src/task.c:1213
Allocations: 673857 (Pool: 673823; Big: 34); GC: 1

Segmentation fault

On my mac this crashes about 90% of the time. If I comment out the pyimport then the probability of crashing goes down somewhat, maybe 70%. If I comment both the first two lines then it doesn't crash. I'm not sure how important the function body is, but it doesn't seem to crash if I have a parallel loop with just sleep(1) in it.

bhawkins

bhawkins commented on Apr 8, 2025

@bhawkins

Using the pylock() suggestion here makes the crash go away. My example isn't actually calling any Python, but I guess gc can run anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bhawkins@axsk

        Issue actions

          Segmentation Fault · Issue #1090 · JuliaPy/PyCall.jl