You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently working on training an OCR classification model in parallel across two machines and would appreciate some guidance on my setup. Below are the details of my configuration:
I have two computers:
Machine 1:
Windows 11
GPU: RTX4090
Public IP: 212.109.144.125
Port open: 6004
Machine 2:
Windows 11
GPU: RTX3090
Public IP: 122.109.144.229
I installed PaddlePaddle using the following command on both machines: python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
However, when I start the training, it seems that the two machines are not able to establish a connection and work together as expected. I am wondering if there might be an issue with my setup or the configuration of the training commands.
Could anyone help me identify what might be wrong or suggest how to fix this?
Thank you in advance for your assistance!
The text was updated successfully, but these errors were encountered:
请提出你的问题 Please ask your question
Hello,
I am currently working on training an OCR classification model in parallel across two machines and would appreciate some guidance on my setup. Below are the details of my configuration:
I have two computers:
Machine 1:
Machine 2:
I installed PaddlePaddle using the following command on both machines:
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
The training commands I used were (https://www.paddlepaddle.org.cn/documentation/docs/en/api/paddle/distributed/launch_en.html):
On Machine 1:
On Machine 2:
However, when I start the training, it seems that the two machines are not able to establish a connection and work together as expected. I am wondering if there might be an issue with my setup or the configuration of the training commands.
Could anyone help me identify what might be wrong or suggest how to fix this?
Thank you in advance for your assistance!
The text was updated successfully, but these errors were encountered: