This guide assumes you are already set up and are using HPG for development and training.
In AI, development is writing code that, well, runs. The goal is correctly loading your data, feeding it through the model, computing your loss function, etc. If we can get it to run for one iteration, then we've passed this step.
Model training, on the other hand, is the computationally expensive stage of running through several (often hundreds) epochs to progressively train a model. This is the part where we need a lot of GPUs.
- Use notebooks to test chunks of code (data loading, for example)
- When feeling good, assemble the code together and run the whole thing for 1 epoch
- If using PyTorch Lighting, consider setting
fast_dev_run=True
in the Trainer object
- If using PyTorch Lighting, consider setting
Here is UF HPG's official guide on using VSCode for remote development from HPG.
-
SSH in to HPG (e.g.
ssh [email protected]
)- (MacOS and Linux) You can create a shortcup so it's just
ssh hpg
if you have your SSH config set up - See this guide
- (MacOS and Linux) You can create a shortcup so it's just
-
(optional) Change your login node to a number that you always choose (e.g.
ssh login7
) -
(optional but usually needed) Begin an interactive session (i.e.
bash interactive.sh
or./interactive.sh
)- This is if you need GPUs (or other computing resources beyond what is provided on login nodes).
- See this for more on when to use login nodes vs using an interactive session
-
(optional) Start tmux (
tmux
)- This is (fun) wizardry. Not needed but p cool. See Sasank for more.
-
Start a vscode tunnel (
bash vscode_tunnel.sh
or./vscode_tunnel.sh
) -
Open VSCode on you local computer and go to the "Remote Explorer". This is a button on the left side bar that looks like a computer screen with a circle at the bottom right.
-
In the menu that comes on the left side bar, you should see your session under REMOTES/Tunnels. Click on the arrow on the right side of your session to connect.
-
Congrats! You're now connected to HiPerGator.
-
Make sure you are using your environment.
- If you are using a "notebook" (something.ipynb), check your "kernel" at the top right to make sure it's using the conda environment you want.
- If you are running a Python script directly, make sure you have the right conda environment loaded.