Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. #48

Open
caozhe19961226 opened this issue Feb 12, 2023 · 1 comment

Comments

@caozhe19961226
Copy link

Traceback (most recent call last):
File "tools/train.py", line 244, in
main()
File "tools/train.py", line 233, in main
train_model(
File "/root/sim-Opera/./opera/apis/train.py", line 245, in train_model
runner.run(data_loaders, cfg.workflow)
File "/root/miniconda3/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/root/miniconda3/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/root/mmdetection/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/root/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/root/sim-Opera/./opera/models/detectors/inspose.py", line 52, in forward_train
feat = self.extract_feat(img)
File "/root/mmdetection/mmdet/models/detectors/single_stage.py", line 43, in extract_feat
x = self.backbone(img)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/sim-Opera/./opera/models/backbones/efficientformer2.py", line 659, in forward
x = self.patch_embed(x)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 732, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
return _get_group_size(group)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
default_pg = _get_default_group()
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

@y-arjun-y
Copy link

This is most likely due to the use of SyncBN instead of BatchNorm (source), try setting that to False in the main.py file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants