Skip to content

Commit 58e862e

Browse files
committed
Add gif
1 parent f231c6b commit 58e862e

File tree

4 files changed

+23
-10
lines changed

4 files changed

+23
-10
lines changed

README.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,26 @@
1-
# LLaMA
1+
# LLaMA
22

33
This repository is intended as a minimal, hackable and readable example to load [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) ([arXiv](https://arxiv.org/abs/2302.13971v1)) models and run inference.
44
In order to download the checkpoints and tokenizer, fill this [google form](https://forms.gle/jk851eBVbX1m5TAv5)
55

6+
## Inference with mpirun
7+
8+
This fork supports launching an LLAMA inference job with multiple instances (one or more GPUs on each instance) uisng `mpirun`. You can find more details [here](deployment/README.md).
9+
10+
Example: Launching an interactive 65B LLAMA inference job across eight 1xA10 Lambda Cloud instances
11+
12+
![Launching 65B LLAMA inference across eight A10 Cloud instances](deployment/pics/newton-einstein-8xA10.gif)
13+
614
## Setup
715

816
In a conda env with pytorch / cuda available, run
17+
918
```
1019
pip install -r requirements.txt
1120
```
21+
1222
Then in this repository
23+
1324
```
1425
pip install -e .
1526
```
@@ -22,18 +33,19 @@ Edit the `download.sh` script with the signed url provided in the email to downl
2233
## Inference
2334

2435
The provided `example.py` can be run on a single or multi-gpu node with `torchrun` and will output completions for two pre-defined prompts. Using `TARGET_FOLDER` as defined in `download.sh`:
36+
2537
```
2638
torchrun --nproc_per_node MP example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model
2739
```
2840

2941
Different models require different MP values:
3042

31-
| Model | MP |
32-
|--------|----|
33-
| 7B | 1 |
34-
| 13B | 2 |
35-
| 33B | 4 |
36-
| 65B | 8 |
43+
| Model | MP |
44+
| ----- | --- |
45+
| 7B | 1 |
46+
| 13B | 2 |
47+
| 33B | 4 |
48+
| 65B | 8 |
3749

3850
## FAQ
3951

@@ -56,7 +68,9 @@ LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/23
5668
```
5769

5870
## Model Card
71+
5972
See [MODEL_CARD.md](MODEL_CARD.md)
6073

6174
## License
75+
6276
See the [LICENSE](LICENSE) file.

deployment/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Despite being more memory effient than previous langauge foundation models, LLAM
1313

1414
Don't worry, this tutorial explains how to use `mpirun` to launch an LLAMA inference job across multiple cloud instances (one or more GPUs on each instance). Here are some key updates in addition to the [original llama repo](https://github.com/facebookresearch/llama) and [shawwn's fork](https://github.com/shawwn/llama):
1515

16-
- A script to easily set up a "cluster" of cloud instances that is ready to run LLAMA inference (all models from 7B to 65B).
16+
- A [script](./setup_nodes.sh) to easily set up a "cluster" of cloud instances that is ready to run LLAMA inference (all models from 7B to 65B).
1717
- `mpirun` compatible, so you can launch the job directly from the head node without the need of typing in the `torchrun` command on the worker nodes.
1818
- Interactive inference mode across multiple nodes.
1919
- `eos_w`: constrols how "lengthy" the results are likely to be by scaling the probability of `eos_token`.
7.98 MB
Loading

deployment/setup_nodes.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,11 @@ for IP in ${WORKER_IP[*]}; do
4141
ssh -i $LAMBDA_CLOUD_KEY ubuntu@$HEAD_IP "echo '/home/ubuntu/shared ${IP}(rw,sync,no_subtree_check)' | sudo tee -a /etc/exports"
4242
done
4343
ssh -i $LAMBDA_CLOUD_KEY ubuntu@$HEAD_IP "sudo systemctl restart nfs-kernel-server"
44-
# echo "NFS set up on the head node"
44+
echo "NFS set up on the head node"
4545

4646
for IP in ${WORKER_IP[*]}; do
4747
ssh -i $LAMBDA_CLOUD_KEY ubuntu@$IP "sudo mount ${HEAD_IP}:/home/ubuntu/shared /home/ubuntu/shared"
4848
done
49-
5049
echo "NFS set up on the worker nodes"
5150

5251
echo "Clone repos into NFS ------------------------------"

0 commit comments

Comments
 (0)