You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we want to test detection task, or just use the shell code like 'bash dist_test.sh configs/retinanet_alt_gvt_s_fpn_1x_coco_pvt_setting.py checkpoint_file 1 --eval mAP' ?
Or change the lr? and the number of the worker ?
I'm a beginner of the mmdet framework, please help...
this is the error lines:
/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK') instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : ./test.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 1
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:29500
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}
INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_o5bp99y9/none_u2fqutod
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0]
role_ranks=[0]
global_ranks=[0]
role_world_sizes=[1]
global_world_sizes=[1]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_0/0/error.json
loading annotations into memory...
Done (t=0.52s)
creating index...
index created!
Traceback (most recent call last):
File "./test.py", line 213, in
main()
File "./test.py", line 166, in main
model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 67, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/retinanet.py", line 16, in init
super(RetinaNet, self).init(backbone, neck, bbox_head, train_cfg,
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 25, in init
self.backbone = build_backbone(backbone)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 37, in build_backbone
return build(cfg, BACKBONES)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/home/user/project/Twins/detection/gvt.py", line 482, in init
super(alt_gvt_small, self).init(
File "/home/user/project/Twins/detection/gvt.py", line 419, in init
super(ALTGVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads,
File "/home/user/project/Twins/detection/gvt.py", line 408, in init
super(PCPVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads,
File "/home/user/project/Twins/detection/gvt.py", line 343, in init
super(CPVTV2, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, mlp_ratios,
File "/home/user/project/Twins/detection/gvt.py", line 234, in init
_block = nn.ModuleList([block_cls(
File "/home/user/project/Twins/detection/gvt.py", line 234, in
_block = nn.ModuleList([block_cls(
File "/home/user/project/Twins/detection/gvt.py", line 164, in init
super(GroupBlock, self).init(dim, num_heads, mlp_ratio, qkv_bias, qk_scale, drop, attn_drop,
TypeError: init() takes from 3 to 10 positional arguments but 11 were given
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11449) of binary: /home/user/miniconda3/envs/twins/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0]
role_ranks=[0]
global_ranks=[0]
role_world_sizes=[1]
global_world_sizes=[1]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_1/0/error.json
The text was updated successfully, but these errors were encountered:
nestor0003
changed the title
Can we train or test on 1 GPUs in detection sections?
Can we train or test on single GPU in detection sections?
Nov 19, 2021
If we want to test detection task, or just use the shell code like 'bash dist_test.sh configs/retinanet_alt_gvt_s_fpn_1x_coco_pvt_setting.py checkpoint_file 1 --eval mAP' ?
Or change the lr? and the number of the worker ?
I'm a beginner of the mmdet framework, please help...
this is the error lines:
/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from
os.environ('LOCAL_RANK')
instead.INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : ./test.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 1
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:29500
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}
INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_o5bp99y9/none_u2fqutod
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0]
role_ranks=[0]
global_ranks=[0]
role_world_sizes=[1]
global_world_sizes=[1]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_0/0/error.json
loading annotations into memory...
Done (t=0.52s)
creating index...
index created!
Traceback (most recent call last):
File "./test.py", line 213, in
main()
File "./test.py", line 166, in main
model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 67, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/retinanet.py", line 16, in init
super(RetinaNet, self).init(backbone, neck, bbox_head, train_cfg,
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 25, in init
self.backbone = build_backbone(backbone)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 37, in build_backbone
return build(cfg, BACKBONES)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmdet/models/builder.py", line 32, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/user/miniconda3/envs/twins/lib/python3.8/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg
return obj_cls(**args)
File "/home/user/project/Twins/detection/gvt.py", line 482, in init
super(alt_gvt_small, self).init(
File "/home/user/project/Twins/detection/gvt.py", line 419, in init
super(ALTGVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads,
File "/home/user/project/Twins/detection/gvt.py", line 408, in init
super(PCPVT, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads,
File "/home/user/project/Twins/detection/gvt.py", line 343, in init
super(CPVTV2, self).init(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, mlp_ratios,
File "/home/user/project/Twins/detection/gvt.py", line 234, in init
_block = nn.ModuleList([block_cls(
File "/home/user/project/Twins/detection/gvt.py", line 234, in
_block = nn.ModuleList([block_cls(
File "/home/user/project/Twins/detection/gvt.py", line 164, in init
super(GroupBlock, self).init(dim, num_heads, mlp_ratio, qkv_bias, qk_scale, drop, attn_drop,
TypeError: init() takes from 3 to 10 positional arguments but 11 were given
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11449) of binary: /home/user/miniconda3/envs/twins/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0]
role_ranks=[0]
global_ranks=[0]
role_world_sizes=[1]
global_world_sizes=[1]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_o5bp99y9/none_u2fqutod/attempt_1/0/error.json
The text was updated successfully, but these errors were encountered: