Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime error:permission denied #13099

Open
1 task done
mdh31 opened this issue Jun 18, 2024 · 2 comments
Open
1 task done

runtime error:permission denied #13099

mdh31 opened this issue Jun 18, 2024 · 2 comments
Labels
question Further information is requested

Comments

@mdh31
Copy link

mdh31 commented Jun 18, 2024

Search before asking

Question

Hi
I want to run train with GPU like this.
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)

My command line:
$ python -m torch.distributed.run --nproc_per_node 2 --master_port 1 segment/train.py --data ./data/gnrDataset_polygon.yaml --weights ~/mdh_share/yolov5s-seg.pt --img 640 --device 0,1

error log:
WARNING:main:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:{
"message": {
"message": "RuntimeError: Permission denied",
"extraInfo": {
"py_callstack": "Traceback (most recent call last):\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper\n return f(*args, **kwargs)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/run.py", line 719, in main\n run(args)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run\n elastic_launch(\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call\n return launch_agent(self._config, self._entrypoint, list(args))\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 252, in launch_agent\n result = agent.run()\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper\n result = f(*args, **kwargs)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run\n result = self._invoke_run(role)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 837, in _invoke_run\n self._initialize_workers(self._worker_group)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper\n result = f(*args, **kwargs)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 678, in _initialize_workers\n self._rendezvous(worker_group)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper\n result = f(*args, **kwargs)\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 538, in _rendezvous\n store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()\n File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 55, in next_rendezvous\n self._store = TCPStore( # type: ignore[call-arg]\nRuntimeError: Permission denied\n",
"timestamp": "1718680548"
}
}
}
Traceback (most recent call last):
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/run.py", line 723, in
main()
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 252, in launch_agent
result = agent.run()
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper
result = f(*args, **kwargs)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run
result = self._invoke_run(role)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 837, in _invoke_run
self._initialize_workers(self._worker_group)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper
result = f(*args, **kwargs)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 678, in _initialize_workers
self._rendezvous(worker_group)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper
result = f(*args, **kwargs)
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 538, in _rendezvous
store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
File "/home/wise/anaconda3/envs/yolov5/lib/python3.9/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 55, in next_rendezvous
self._store = TCPStore( # type: ignore[call-arg]
RuntimeError: Permission denied

Thanks a lot.

Additional

No response

@mdh31 mdh31 added the question Further information is requested label Jun 18, 2024
Copy link
Contributor

👋 Hello @mdh31, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@glenn-jocher
Copy link
Member

@mdh31 hello,

Thank you for reaching out and providing detailed information about the issue you're encountering. The "RuntimeError: Permission denied" error typically indicates that there might be a permissions issue with the port or the environment setup.

Here are a few steps to help troubleshoot and resolve this issue:

  1. Check Port Permissions: Ensure that the port you are using (--master_port 1) is not restricted or already in use. You can try using a different port number, for example:

    python -m torch.distributed.run --nproc_per_node 2 --master_port 12345 segment/train.py --data ./data/gnrDataset_polygon.yaml --weights ~/mdh_share/yolov5s-seg.pt --img 640 --device 0,1
  2. Run as Administrator: If you are on a Unix-based system, try running the command with sudo to ensure you have the necessary permissions:

    sudo python -m torch.distributed.run --nproc_per_node 2 --master_port 12345 segment/train.py --data ./data/gnrDataset_polygon.yaml --weights ~/mdh_share/yolov5s-seg.pt --img 640 --device 0,1
  3. Update Packages: Ensure you are using the latest versions of torch and yolov5. You can update them using:

    pip install --upgrade torch
    cd yolov5
    git pull
    pip install -r requirements.txt
  4. Environment Variables: Sometimes, setting the OMP_NUM_THREADS environment variable can help. You can set it to 1 before running your command:

    export OMP_NUM_THREADS=1
    python -m torch.distributed.run --nproc_per_node 2 --master_port 12345 segment/train.py --data ./data/gnrDataset_polygon.yaml --weights ~/mdh_share/yolov5s-seg.pt --img 640 --device 0,1
  5. Minimum Reproducible Example: If the issue persists, please provide a minimum reproducible code example. This will help us investigate the problem more effectively. You can find more details on creating a minimum reproducible example here: Minimum Reproducible Example.

If you have tried all the above steps and the issue still exists, please let us know with any additional error logs or details. This will help us assist you better.

Thank you for your patience, and we look forward to helping you resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants