All the code for the hands-on exercies can be found in this repository.
Table of Contents
To request an account on Zaratan, please join slack at the link above, and fill this Google form.
We have pre-built the dependencies required for this tutorial on Zaratan. This will be activated automatically when you run the bash scripts.
Model weights and the training dataset have
been downloaded in /scratch/zt1/project/isc/shared/
.
CONFIG_FILE=configs/single_gpu.json sbatch train_single.sh
Open configs/single_gpu.json
and change precision
to bf16-mixed
and then run -
CONFIG_FILE=configs/single_gpu.json sbatch train_single.sh
CONFIG_FILE=configs/ddp.json sbatch train_multi.sh
CONFIG_FILE=configs/fsdp.json sbatch train_multi.sh
CONFIG_FILE=configs/axonn.json sbatch train_multi.sh
Add more prompts to data/inference/prompts.txt
if you want. Then run
CONFIG_FILE=configs/inference_yalis.json sbatch infer_single.sh
Open infer.sh
and change YALIS_DISABLE_COMPILE
from 1
to 0
. Then run
CONFIG_FILE=configs/inference_yalis.json sbatch infer_single.sh
Open infer.sh
and change YALIS_DISABLE_DECODE_CUDAGRAPHS
from 1
to 0
(make sure torch compile is also enabled). Then run
CONFIG_FILE=configs/inference_yalis.json sbatch infer_single.sh
CONFIG_FILE=configs/inference_yalis.json sbatch infer_multi.sh
Query the vllm server we setup as follows:
# Usage: ./llm_request.sh <server_ip> "<prompt>" [max_tokens]
./llm_request.sh <vLLM Server IP> "San Francisco is a" 64
Change the prompt and the max tokens argument to play around with command