[Bug] Segmentation fault #1133

niehen6174 · 2024-11-05T02:53:28Z

Your current environment information

ibibverbs not available, ibv_fork_init skipped
Collecting environment information...
PyTorch version: 2.1.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OneFlow version: path: ['/opt/conda/lib/python3.10/site-packages/oneflow'], version: 0.9.1.dev20241019+cu118, git_commit: d23c061, cmake_build_type: Release, rdma: True, mlir: True, enterprise: False
Nexfort version: none
OneDiff version: 1.2.1.dev15+g241fe57d
OneDiffX version: none

GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)
Clang version: Could not collect
CMake version: version 3.30.4
Libc version: glibc-2.28

Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.241-1-tlinux4-0017.7-x86_64-with-glibc2.28
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA L40
Nvidia driver version: 525.125.06
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.8.9.7
/usr/lib64/libcudnn_adv_infer.so.8.9.7
/usr/lib64/libcudnn_adv_train.so.8.9.7
/usr/lib64/libcudnn_cnn_infer.so.8.9.7
/usr/lib64/libcudnn_cnn_train.so.8.9.7
/usr/lib64/libcudnn_ops_infer.so.8.9.7
/usr/lib64/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9K84 96-Core Processor
Stepping: 0
CPU MHz: 2600.034
BogoMIPS: 5200.06
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 32768K
NUMA node0 CPU(s): 0-191
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid amd_dcm tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ibpb vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 avx512_bf16 clzero xsaveerptr wbnoinvd arat avx512vbmi umip avx512_vbmi2 vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm

Versions of relevant libraries:
[pip3] diffusers==0.30.3
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] onnx==1.17.0
[pip3] onnxruntime==1.19.2
[pip3] onnxruntime-gpu==1.18.0
[pip3] open-clip-torch==2.20.0
[pip3] torch==2.1.1
[pip3] torchaudio==2.1.1
[pip3] torchsde==0.2.6
[pip3] torchvision==0.16.1
[pip3] transformers==4.44.2
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.8.0 h6a678d5_0
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.10 py310h5eee18b_0
[conda] mkl_random 1.2.7 py310h1128e8f_0
[conda] numpy 1.26.4 py310h5f9d8c6_0
[conda] numpy-base 1.26.4 py310hb5e798b_0
[conda] open-clip-torch 2.20.0 pypi_0 pypi
[conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 2.1.1 pypi_0 pypi
[conda] torchaudio 2.1.1 py310_cu121 pytorch
[conda] torchsde 0.2.6 pypi_0 pypi
[conda] torchvision 0.16.1 pypi_0 pypi
[conda] triton 2.1.0 pypi_0 pypi

🐛 Describe the bug

I encountered a Segmentation fault issue while using OneDiff in ComfyUI, with no additional error information. I am seeking some assistance.

I encountered two types of Segmentation fault errors:

Stack trace (most recent call last) in thread 256218: 
Segmentation fault (Signal sent by the kernel [(nil)]) 

Stack trace (most recent call last) in thread 417043: 
Segmentation fault (Address not mapped to object [(nil)])

After testing, this error is unrelated to a single workflow and is also unrelated to previously executed workflows. I currently have no leads and have not been able to reproduce it again; the occurrence seems to be quite random.

The text was updated successfully, but these errors were encountered:

niehen6174 · 2024-11-05T02:59:44Z

This error is likely originating from OneFlow, and I have also seen similar issues in other issues(#393 #1080 ), but none of them have provided a solution.

@strint Could you please take a look at this, or do you have any suggestions for a solution?

niehen6174 added the Request-bug Something isn't working label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Segmentation fault #1133

[Bug] Segmentation fault #1133

niehen6174 commented Nov 5, 2024

niehen6174 commented Nov 5, 2024

[Bug] Segmentation fault #1133

[Bug] Segmentation fault #1133

Comments

niehen6174 commented Nov 5, 2024

Your current environment information

🐛 Describe the bug

niehen6174 commented Nov 5, 2024