Skip to content

Int32/64 addressing errors and julia crashing #857

@evelyne-ringoot

Description

@evelyne-ringoot

Questionnaire

  1. Does ROCm works for you outside of Julia, e.g. C/C++/Python? yes

  2. Post output of rocminfo.

ROCk module version 6.16.6 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.18
Runtime Ext Version:     1.14
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    INTEL(R) XEON(R) PLATINUM 8568Y+   
  Uuid:                    CPU-XX                             
  Marketing Name:          INTEL(R) XEON(R) PLATINUM 8568Y+   
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            20                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    247409228(0xebf2a4c) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    247409228(0xebf2a4c) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    247409228(0xebf2a4c) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    247409228(0xebf2a4c) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx942                             
  Uuid:                    GPU-72d9a6340dabceda               
  Marketing Name:          AMD Instinct MI300X VF             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      4096(0x1000) KB                    
    L3:                      262144(0x40000) KB                 
  Chip ID:                 29877(0x74b5)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2100                               
  BDFID:                   33536                              
  Internal Node ID:        1                                  
  Compute Unit:            304                                
  SIMDs per CU:            4                                  
  Shader Engines:          32                                 
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    2048(0x800)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        2147483647(0x7fffffff)             
    y                        65535(0xffff)                      
    z                        65535(0xffff)                      
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 189                                
  SDMA engine uCode::      25                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    200998912(0xbfb0000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    200998912(0xbfb0000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    200998912(0xbfb0000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 4                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
    ISA 2                    
      Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
*** Done ***         
  1. Post output of AMDGPU.versioninfo() if possible.
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬──────────────────────────────────────────────────────────────────────
│ Available │ Name             │ Version   │ Path                                                                ⋯
├───────────┼──────────────────┼───────────┼──────────────────────────────────────────────────────────────────────
│     +     │ LLD              │ -         │ /opt/rocm-7.1.0/lib/llvm/bin/ld.lld                                 ⋯
│     +     │ Device Libraries │ -         │ /root/.julia/artifacts/b46ab46ef568406312e5f500efb677511199c2f9/amd ⋯
│     +     │ HIP              │ 7.1.25424 │ /opt/rocm-7.1.0/lib/libamdhip64.so                                  ⋯
│     +     │ rocBLAS          │ 5.1.0     │ /opt/rocm-7.1.0/lib/librocblas.so                                   ⋯
│     +     │ rocSOLVER        │ 3.31.0    │ /opt/rocm-7.1.0/lib/librocsolver.so                                 ⋯
│     +     │ rocSPARSE        │ 4.1.0     │ /opt/rocm-7.1.0/lib/librocsparse.so                                 ⋯
│     +     │ rocRAND          │ 2.10.5    │ /opt/rocm-7.1.0/lib/librocrand.so                                   ⋯
│     +     │ rocFFT           │ 1.0.35    │ /opt/rocm-7.1.0/lib/librocfft.so                                    ⋯
│     +     │ MIOpen           │ 3.5.1     │ /opt/rocm-7.1.0/lib/libMIOpen.so                                    ⋯
└───────────┴──────────────────┴───────────┴──────────────────────────────────────────────────────────────────────
                                                                                                  1 column omitted

[ Info: AMDGPU devices
┌────┬────────────────────────┬────────────────────────┬───────────┬─────────────┬───────────────┐
│ Id │                   Name │               GCN arch │ Wavefront │      Memory │ Shared Memory │
├────┼────────────────────────┼────────────────────────┼───────────┼─────────────┼───────────────┤
│  1 │ AMD Instinct MI300X VF │ gfx942:sramecc+:xnack- │        64 │ 191.688 GiB │    64.000 KiB │
└────┴────────────────────────┴────────────────────────┴───────────┴─────────────┴───────────────┘

Reproducing the bug

  1. Describe what's not working.
    It appears some ROCM functions have not been implemented for int64 indexing and result in confusing errors. I am not sure if this should be documented, the error should be changed, or a julia default function should be called for those cases. In other cases, julia crashes when encountering the issue, which is more inconvenient.

  2. Provide MWE to reproduce it (if possible).

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.2 (2025-11-20)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org release
|__/                   |

julia> using AMDGPU, LinearAlgebra
julia> n=50000
50000

julia> a=AMDGPU.rand(n,n)
ERROR: InexactError: trunc(Int32, 2500000000)
Stacktrace:
  [1] throw_inexacterror(func::Symbol, to::Type, val::Int64)
    @ Core ./boot.jl:815
  [2] checked_trunc_sint
    @ ./boot.jl:829 [inlined]
  [3] toInt32
    @ ./boot.jl:866 [inlined]
  [4] Int32
    @ ./boot.jl:956 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/AMDGPU/FyUG3/src/utils.jl:122 [inlined]
  [8] rocrand_generate_uniform
    @ ~/.julia/packages/AMDGPU/FyUG3/src/rand/librocrand.jl:28 [inlined]
  [9] rand!
    @ ~/.julia/packages/AMDGPU/FyUG3/src/rand/random.jl:51 [inlined]
 [10] rand!(A::ROCArray{Float32, 2, AMDGPU.Runtime.Mem.HIPBuffer})
    @ AMDGPU ~/.julia/packages/AMDGPU/FyUG3/src/random.jl:39
 [11] rand(dim1::Int64, dims::Int64)
    @ AMDGPU ~/.julia/packages/AMDGPU/FyUG3/src/random.jl:67
 [12] top-level scope
    @ REPL[4]:1

julia> a=AMDGPU.zeros(n,n).+1
50000×50000 ROCArray{Float32, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

julia> norm(a)
ERROR: InexactError: trunc(Int32, 2500000000)
Stacktrace:
  [1] throw_inexacterror(func::Symbol, to::Type, val::Int64)
    @ Core ./boot.jl:815
  [2] checked_trunc_sint
    @ ./boot.jl:829 [inlined]
  [3] toInt32
    @ ./boot.jl:866 [inlined]
  [4] Int32
    @ ./boot.jl:956 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/AMDGPU/FyUG3/src/utils.jl:122 [inlined]
  [8] rocblas_snrm2(handle::Ptr{…}, n::Int64, x::ROCArray{…}, incx::Int64, result::Base.RefValue{…})
    @ AMDGPU.rocBLAS ~/.julia/packages/AMDGPU/FyUG3/src/blas/librocblas.jl:2355
  [9] nrm2
    @ ~/.julia/packages/AMDGPU/FyUG3/src/blas/wrappers.jl:86 [inlined]
 [10] norm(x::ROCArray{Float32, 2, AMDGPU.Runtime.Mem.HIPBuffer})
    @ AMDGPU.rocBLAS ~/.julia/packages/AMDGPU/FyUG3/src/blas/highlevel.jl:49
 [11] top-level scope
    @ REPL[14]:1
Some type information was truncated. Use `show(err)` to see complete types.

julia> svdvals(a)

[3636] signal 11 (2): Segmentation fault
in expression starting at REPL[6]:1
slange_64_ at /root/julia-1.12.2/bin/../lib/julia/libopenblas64_.so (unknown line)
sgesdd_64_ at /root/julia-1.12.2/bin/../lib/julia/libopenblas64_.so (unknown line)
gesdd! at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/LinearAlgebra/src/lapack.jl:1706
svdvals! at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/LinearAlgebra/src/svd.jl:218 [inlined]
svdvals at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/LinearAlgebra/src/svd.jl:243
unknown function (ip: 0x7af09efe9242) at (unknown file)
jl_apply at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/julia.h:2391 [inlined]
do_call at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/interpreter.c:123
eval_value at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/interpreter.c:243
eval_stmt_value at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/interpreter.c:194 [inlined]
eval_body at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/interpreter.c:707
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/interpreter.c:898
jl_toplevel_eval_flex at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/toplevel.c:1035
__repl_entry_eval_expanded_with_loc at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:301
jl_apply at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/julia.h:2391 [inlined]
jl_f_invokelatest at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/builtins.c:881
toplevel_eval_with_hooks at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:308
toplevel_eval_with_hooks at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:312
toplevel_eval_with_hooks at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:305 [inlined]
eval_user_input at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:330
repl_backend_loop at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:452
#start_repl_backend#41 at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:427
start_repl_backend at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:424 [inlined]
#run_repl#50 at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:653
run_repl at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/usr/share/julia/stdlib/v1.12/REPL/src/REPL.jl:639
jfptr_run_repl_19709.1 at /root/julia-1.12.2/share/julia/compiled/v1.12/REPL/u0gqU_DlLf6.so (unknown line)
run_std_repl at ./client.jl:478
jfptr_run_std_repl_36841.1 at /root/julia-1.12.2/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/julia.h:2391 [inlined]
jl_f_invokelatest at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/builtins.c:881
run_main_repl at ./client.jl:499
repl_main at ./client.jl:586 [inlined]
_start at ./client.jl:561
jfptr__start_47004.1 at /root/julia-1.12.2/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/julia.h:2391 [inlined]
true_main at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/jlapi.c:971
jl_repl_entrypoint at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/src/jlapi.c:1139
main at /cache/build/builder-amdci5-2/julialang/julia-release-1-dot-12/cli/loader_exe.c:58
unknown function (ip: 0x7af0dde2a1c9) at /lib/x86_64-linux-gnu/libc.so.6
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8) at /workspace/srcdir/glibc-2.17/csu/../sysdeps/x86_64/start.S
Allocations: 52830844 (Pool: 52830469; Big: 375); GC: 31
Segmentation fault (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions