Skip to content

hloc crashes on M1 Macbook (MPS) due to OpenMP threading conflict (in base and in Conda) #491

@jared-krauss

Description

@jared-krauss

I'm an artist trying to learn to use these tools, specifically for making Gaussian Splats. Gettin ghelp from Claude to make it all happen.

Can't get hloc to work with MPS, CPU only.

Sharing the report I've had Claude write up below.

hloc crashes with SIGSEGV on macOS Apple Silicon (MPS) due to OpenMP threading conflict

Environment

Component | Version -- | -- OS | macOS (Apple Silicon M1, 16GB RAM) Python | 3.11 (clean conda environment) PyTorch | 2.5.1 (also tested with 2.9.1) hloc | 1.5 (installed from source) kornia | 0.8.2 pycolmap | 3.13.0

Problem

hloc crashes with segmentation faults when running extract_features or match_features on Apple Silicon with MPS backend enabled. The crash occurs despite MPS being available and model inference working correctly in isolation.

Error Output

OMP: Error #179: Function pthread_mutex_init failed:
OMP: System error #22: Invalid argument
*** SIGSEGV (@0x580) received by PID ... (TID 0x16eabb000) stack trace: ***

Also preceded by (recoverable with KMP_DUPLICATE_LIB_OK=TRUE):

OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized.

Steps to Reproduce

bash
conda activate hloc  # clean environment with pytorch from conda
cd ~/Hierarchical-Localization

KMP_DUPLICATE_LIB_OK=TRUE python -m hloc.extract_features \
    --image_dir ./images \
    --export_dir ./outputs \
    --conf superpoint_max

Diagnostic Steps Taken

1. Confirmed MPS works in isolation

python
import torch
print(torch.backends.mps.is_available())  # True

from hloc.extractors.superpoint import SuperPoint
model = SuperPoint({'max_keypoints': 4096, 'nms_radius': 3}).to('mps')  # OK

dummy = torch.randn(1, 1, 480, 640, device='mps')
with torch.no_grad():
    out = model({'image': dummy})  # OK

All passes — MPS inference works fine outside hloc's pipeline.

2. Patched DataLoader settings

Modified hloc/extract_features.py (line 262-263) and hloc/match_features.py (line 241-242):

python
# Changed from:
loader = torch.utils.data.DataLoader(
    dataset, num_workers=1, shuffle=False, pin_memory=True
)

# To:
loader = torch.utils.data.DataLoader(
    dataset, num_workers=0, shuffle=False, pin_memory=False
)

Result: Still crashes.

3. Created clean conda environment

bash
conda create -n hloc python=3.11
conda activate hloc
conda install pytorch torchvision -c pytorch  # PyTorch 2.5.1
pip install -e .

Result: Still crashes with identical error.

4. Set OpenMP environment variables

bash
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export KMP_DUPLICATE_LIB_OK=TRUE

Result: Still crashes.

Working Workaround

Force CPU execution:

python
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
torch.set_default_device('cpu')

from pathlib import Path
from hloc import extract_features

extract_features.main(
    conf=extract_features.confs['superpoint_max'],
    image_dir=Path('./images'),
    export_dir=Path('./outputs'),
)

Works reliably but sacrifices the ~4-6x speedup MPS would provide.

Analysis

The crash occurs in a spawned thread (TID 0x16eabb000), not the main thread. Since:

  • MPS model inference works in isolation ✓
  • num_workers=0 still crashes (no DataLoader worker threads)
  • Clean conda environment still crashes
  • Multiple PyTorch versions (2.5.1, 2.9.1) both crash

The issue appears to be in how PyTorch's MPS backend interacts with OpenMP when running within hloc's pipeline structure—possibly related to device context management across function calls or tensor movement between CPU and MPS during the data loading/inference loop.

Suggested Fix

Consider adding MPS-aware configuration:

python
import torch

if torch.backends.mps.is_available():
    num_workers = 0
    pin_memory = False
else:
    num_workers = 5
    pin_memory = True

loader = torch.utils.data.DataLoader(
    dataset, num_workers=num_workers, shuffle=False, pin_memory=pin_memory
)

Or expose num_workers and pin_memory as user-configurable options in the conf dict.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions