Skip to content

RicercarG/NYU-Greene-HPC-Cheatsheet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYU-Greene-HPC-Cheatsheet

Written by Yuanhe Guo ([email protected])

A beginner guide for getting started with running python on nyu greene hpc.

Each time I want to setup a new environment on NYU Greene HPC, I have to go through the official documentation and search for the commands I need. Meanwhile, there exist quite a few common issues whose solutions are not covered by the offical documentation. So I decided to write a cheatsheet for myself and others who may need it.

I wrote some complex commands into bash scripts, so that getting python run flawlessly on NYU Greene HPC is as simple as ordering a burger in a fast food restaurant.

Check the wiki page to get started.

Acknowledgement

Important updates (planning to move this section to Releases)

  • [2025.8.1] Added custom CUDA version support. Singularity will be built from scratch from Docker Hub. Note that this singularity has no cuDNN so you will need to install cuDNN yourself. The easiest way is to go to Nvidia cuDNN Archive, download the version you want, scp to greene and install it.

  • [2025.5.29] A [y]/n option is added when the user is prompted to create a new environment. A new folder will only be created if the user enters "y" or "yes". This is to prevent creating unnecessary folders when the user made a typo when activating existing environment.

  • [2024.6.13] The file structure has undergone a significant change. Now you can clone the entire repo. If you have used this cheatsheet before, please first remove chsdevice.sh and chslauncher.sh on your hpc. Then follow the updated wiki instruction to setup your environment. New features are as following:

    • The new file structure is more organized and easier to maintain.
    • A new run_setup.sh script is added to help you setup ~/.bashrc file automatically.
    • Added options for H100 GPU and cuda 12.1
  • [2024.4.7] Bug fix for handling conda installation failure in Lazy Launcher. Check [[ this part of troubleshooting|Trouble Shooting#conda-environment-installation-failed ]] for detail.

Topics covered in this cheatsheet

Please feel free to open an issue if you have any questions or suggestions.

  • Prereq
    • Apply for NYU Greene HPC access
    • Basic Linux commands
    • Vim
    • Vscode
  • Quick Starting Pack
    • Connect to HPC
    • Request CPU/GPU Sessions
    • Interactive sessions for conda
    • Jupyter Notebook
    • Batch jobs
  • Manual Setup
    • Offical guide Index
  • Trouble Shooting
    • How can I quit python/singualrity/runtime?
    • How can I jump back when kicked off by accident?
    • Disk quota exceeded
    • Could not login server through vscode
    • Out of Memory Error
    • Could not open singularity environment
    • Some linux commands could not be executed
  • Advanced Topics (Useful Tricks)
    • Setup bashrc
    • Setup ssh key pairs
    • Collection of useful linux commands
    • Sharing files with Other HPC Users
    • Sending files to/from HPC
    • SSH Tunneling on GPU Nodes
    • AWS S3 Connection
    • Access through iPad
    • Using in-node memory for faster training(move to Advanced Topics)
    • Distributed training on multi-node using RDZV, srun -W and torchrun
    • Submitting Topology-aware GPU jobs for NCCL-heavy training
    • Use SLURM Job array to sweep hyper parameter and random seed

About

A beginner guide for getting started with nyu greene hpc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages