You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ibibverbs not available, ibv_fork_init skipped
Collecting environment information...
PyTorch version: 2.1.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
I encountered a Segmentation fault issue while using OneDiff in ComfyUI, with no additional error information. I am seeking some assistance.
I encountered two types of Segmentation fault errors:
Stack trace (most recent call last) in thread 256218:
Segmentation fault (Signal sent by the kernel [(nil)])
Stack trace (most recent call last) in thread 417043:
Segmentation fault (Address not mapped to object [(nil)])
After testing, this error is unrelated to a single workflow and is also unrelated to previously executed workflows. I currently have no leads and have not been able to reproduce it again; the occurrence seems to be quite random.
The text was updated successfully, but these errors were encountered:
This error is likely originating from OneFlow, and I have also seen similar issues in other issues(#393#1080 ), but none of them have provided a solution.
@strint Could you please take a look at this, or do you have any suggestions for a solution?
Your current environment information
ibibverbs not available, ibv_fork_init skipped
Collecting environment information...
PyTorch version: 2.1.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OneFlow version: path: ['/opt/conda/lib/python3.10/site-packages/oneflow'], version: 0.9.1.dev20241019+cu118, git_commit: d23c061, cmake_build_type: Release, rdma: True, mlir: True, enterprise: False
Nexfort version: none
OneDiff version: 1.2.1.dev15+g241fe57d
OneDiffX version: none
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)
Clang version: Could not collect
CMake version: version 3.30.4
Libc version: glibc-2.28
Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.241-1-tlinux4-0017.7-x86_64-with-glibc2.28
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA L40
Nvidia driver version: 525.125.06
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.8.9.7
/usr/lib64/libcudnn_adv_infer.so.8.9.7
/usr/lib64/libcudnn_adv_train.so.8.9.7
/usr/lib64/libcudnn_cnn_infer.so.8.9.7
/usr/lib64/libcudnn_cnn_train.so.8.9.7
/usr/lib64/libcudnn_ops_infer.so.8.9.7
/usr/lib64/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9K84 96-Core Processor
Stepping: 0
CPU MHz: 2600.034
BogoMIPS: 5200.06
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 32768K
NUMA node0 CPU(s): 0-191
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid amd_dcm tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ibpb vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 avx512_bf16 clzero xsaveerptr wbnoinvd arat avx512vbmi umip avx512_vbmi2 vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm
Versions of relevant libraries:
[pip3] diffusers==0.30.3
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] onnx==1.17.0
[pip3] onnxruntime==1.19.2
[pip3] onnxruntime-gpu==1.18.0
[pip3] open-clip-torch==2.20.0
[pip3] torch==2.1.1
[pip3] torchaudio==2.1.1
[pip3] torchsde==0.2.6
[pip3] torchvision==0.16.1
[pip3] transformers==4.44.2
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.8.0 h6a678d5_0
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.10 py310h5eee18b_0
[conda] mkl_random 1.2.7 py310h1128e8f_0
[conda] numpy 1.26.4 py310h5f9d8c6_0
[conda] numpy-base 1.26.4 py310hb5e798b_0
[conda] open-clip-torch 2.20.0 pypi_0 pypi
[conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 2.1.1 pypi_0 pypi
[conda] torchaudio 2.1.1 py310_cu121 pytorch
[conda] torchsde 0.2.6 pypi_0 pypi
[conda] torchvision 0.16.1 pypi_0 pypi
[conda] triton 2.1.0 pypi_0 pypi
🐛 Describe the bug
I encountered a Segmentation fault issue while using OneDiff in ComfyUI, with no additional error information. I am seeking some assistance.
I encountered two types of Segmentation fault errors:
After testing, this error is unrelated to a single workflow and is also unrelated to previously executed workflows. I currently have no leads and have not been able to reproduce it again; the occurrence seems to be quite random.
The text was updated successfully, but these errors were encountered: