-
Notifications
You must be signed in to change notification settings - Fork 727
Description
I'm an artist trying to learn to use these tools, specifically for making Gaussian Splats. Gettin ghelp from Claude to make it all happen.
Can't get hloc to work with MPS, CPU only.
Sharing the report I've had Claude write up below.
hloc crashes with SIGSEGV on macOS Apple Silicon (MPS) due to OpenMP threading conflict
Environment
Problem
hloc crashes with segmentation faults when running extract_features or match_features on Apple Silicon with MPS backend enabled. The crash occurs despite MPS being available and model inference working correctly in isolation.
Error Output
OMP: Error #179: Function pthread_mutex_init failed:
OMP: System error #22: Invalid argument
*** SIGSEGV (@0x580) received by PID ... (TID 0x16eabb000) stack trace: ***Also preceded by (recoverable with KMP_DUPLICATE_LIB_OK=TRUE):
OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized.Steps to Reproduce
conda activate hloc # clean environment with pytorch from conda
cd ~/Hierarchical-Localization
KMP_DUPLICATE_LIB_OK=TRUE python -m hloc.extract_features \
--image_dir ./images \
--export_dir ./outputs \
--conf superpoint_maxDiagnostic Steps Taken
1. Confirmed MPS works in isolation
import torch
print(torch.backends.mps.is_available()) # True
from hloc.extractors.superpoint import SuperPoint
model = SuperPoint({'max_keypoints': 4096, 'nms_radius': 3}).to('mps') # OK
dummy = torch.randn(1, 1, 480, 640, device='mps')
with torch.no_grad():
out = model({'image': dummy}) # OKAll passes — MPS inference works fine outside hloc's pipeline.
2. Patched DataLoader settings
Modified hloc/extract_features.py (line 262-263) and hloc/match_features.py (line 241-242):
# Changed from:
loader = torch.utils.data.DataLoader(
dataset, num_workers=1, shuffle=False, pin_memory=True
)
# To:
loader = torch.utils.data.DataLoader(
dataset, num_workers=0, shuffle=False, pin_memory=False
)Result: Still crashes.
3. Created clean conda environment
conda create -n hloc python=3.11
conda activate hloc
conda install pytorch torchvision -c pytorch # PyTorch 2.5.1
pip install -e .Result: Still crashes with identical error.
4. Set OpenMP environment variables
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export KMP_DUPLICATE_LIB_OK=TRUEResult: Still crashes.
Working Workaround
Force CPU execution:
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
torch.set_default_device('cpu')
from pathlib import Path
from hloc import extract_features
extract_features.main(
conf=extract_features.confs['superpoint_max'],
image_dir=Path('./images'),
export_dir=Path('./outputs'),
)Works reliably but sacrifices the ~4-6x speedup MPS would provide.
Analysis
The crash occurs in a spawned thread (TID 0x16eabb000), not the main thread. Since:
- MPS model inference works in isolation ✓
num_workers=0still crashes (no DataLoader worker threads)- Clean conda environment still crashes
- Multiple PyTorch versions (2.5.1, 2.9.1) both crash
The issue appears to be in how PyTorch's MPS backend interacts with OpenMP when running within hloc's pipeline structure—possibly related to device context management across function calls or tensor movement between CPU and MPS during the data loading/inference loop.
Suggested Fix
Consider adding MPS-aware configuration:
import torch
if torch.backends.mps.is_available():
num_workers = 0
pin_memory = False
else:
num_workers = 5
pin_memory = True
loader = torch.utils.data.DataLoader(
dataset, num_workers=num_workers, shuffle=False, pin_memory=pin_memory
)Or expose num_workers and pin_memory as user-configurable options in the conf dict.