Skip to content

Commit e6145a0

Browse files
committed
Add FAQ.md // add command line options
1 parent 76066b1 commit e6145a0

File tree

3 files changed

+113
-10
lines changed

3 files changed

+113
-10
lines changed

FAQ.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# FAQ
2+
## <a name="1"></a>1. The download.sh script doesn't work on default bash in MacOS X:
3+
4+
Please see answers from theses issues:
5+
- https://github.com/facebookresearch/llama/issues/41#issuecomment-1451290160
6+
- https://github.com/facebookresearch/llama/issues/53#issue-1606582963
7+
8+
9+
## <a name="2"></a>2. Generations are bad!
10+
11+
Keep in mind these models are not finetuned for question answering. As such, they should be prompted so that the expected answer is the natural continuation of the prompt.
12+
13+
Here are a few examples of prompts (from [issue#69](https://github.com/facebookresearch/llama/issues/69)) geared towards finetuned models, and how to modify them to get the expected results:
14+
- Do not prompt with "What is the meaning of life? Be concise and do not repeat yourself." but with "I believe the meaning of life is"
15+
- Do not prompt with "Explain the theory of relativity." but with "Simply put, the theory of relativity states that"
16+
- Do not prompt with "Ten easy steps to build a website..." but with "Building a website can be done in 10 simple steps:\n"
17+
18+
To be able to directly prompt the models with questions / instructions, you can either:
19+
- Prompt it with few-shot examples so that the model understands the task you have in mind.
20+
- Finetune the models on datasets of instructions to make them more robust to input prompts.
21+
22+
We've updated `example.py` with more sample prompts. Overall, always keep in mind that models are very sensitive to prompts (particularly when they have not been finetuned).
23+
24+
## <a name="3"></a>3. CUDA Out of memory errors
25+
26+
The `example.py` file pre-allocates a cache according to these settings:
27+
```python
28+
model_args: ModelArgs = ModelArgs(max_seq_len=max_seq_len, max_batch_size=max_batch_size, **params)
29+
```
30+
31+
Accounting for 14GB of memory for the model weights (7B model), this leaves 16GB available for the decoding cache which stores 2 * 2 * n_layers * max_batch_size * max_seq_len * n_heads * head_dim bytes.
32+
33+
With default parameters, this cache was about 17GB (2 * 2 * 32 * 32 * 1024 * 32 * 128) for the 7B model.
34+
35+
We've added command line options to `example.py` and changed the default `max_seq_len` to 512 which should allow decoding on 30GB GPUs.
36+
37+
Feel free to lower these settings according to your hardware.
38+
39+
## <a name="4"></a>4. Other languages
40+
The model was trained primarily on English, but also on a few other languages with Latin or Cyrillic alphabets.
41+
42+
For instance, LLaMA was trained on Wikipedia for the 20 following languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk.
43+
44+
LLaMA's tokenizer splits unseen characters into UTF-8 bytes, as a result, it might also be able to process other languages like Chinese or Japanese, even though they use different characters.
45+
46+
Although the fraction of these languages in the training was negligible, LLaMA still showcases some abilities in Chinese-English translation:
47+
48+
```
49+
Prompt = "J'aime le chocolat = I like chocolate\n祝你一天过得愉快 ="
50+
Output = "I wish you a nice day"
51+
```

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ Different models require different MP values:
3232
| 33B | 4 |
3333
| 65B | 8 |
3434

35+
### FAQ
36+
- [1. The download.sh script doesn't work on default bash in MacOS X](FAQ.md#1)
37+
- [2. Generations are bad!](FAQ.md#2)
38+
- [3. CUDA Out of memory errors](FAQ.md#3)
39+
- [4. Other languages](FAQ.md#4)
3540

3641
### Model Card
3742
See [MODEL_CARD.md](MODEL_CARD.md)

example.py

Lines changed: 57 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,19 +29,28 @@ def setup_model_parallel() -> Tuple[int, int]:
2929
return local_rank, world_size
3030

3131

32-
def load(ckpt_dir: str, tokenizer_path: str, local_rank: int, world_size: int) -> LLaMA:
32+
def load(
33+
ckpt_dir: str,
34+
tokenizer_path: str,
35+
local_rank: int,
36+
world_size: int,
37+
max_seq_len: int,
38+
max_batch_size: int,
39+
) -> LLaMA:
3340
start_time = time.time()
3441
checkpoints = sorted(Path(ckpt_dir).glob("*.pth"))
35-
assert (
36-
world_size == len(checkpoints)
42+
assert world_size == len(
43+
checkpoints
3744
), f"Loading a checkpoint for MP={len(checkpoints)} but world size is {world_size}"
3845
ckpt_path = checkpoints[local_rank]
3946
print("Loading")
4047
checkpoint = torch.load(ckpt_path, map_location="cpu")
4148
with open(Path(ckpt_dir) / "params.json", "r") as f:
4249
params = json.loads(f.read())
4350

44-
model_args: ModelArgs = ModelArgs(max_seq_len=1024, max_batch_size=32, **params)
51+
model_args: ModelArgs = ModelArgs(
52+
max_seq_len=max_seq_len, max_batch_size=max_batch_size, **params
53+
)
4554
tokenizer = Tokenizer(model_path=tokenizer_path)
4655
model_args.vocab_size = tokenizer.n_words
4756
torch.set_default_tensor_type(torch.cuda.HalfTensor)
@@ -54,14 +63,52 @@ def load(ckpt_dir: str, tokenizer_path: str, local_rank: int, world_size: int) -
5463
return generator
5564

5665

57-
def main(ckpt_dir: str, tokenizer_path: str, temperature: float = 0.8, top_p: float = 0.95):
66+
def main(
67+
ckpt_dir: str,
68+
tokenizer_path: str,
69+
temperature: float = 0.8,
70+
top_p: float = 0.95,
71+
max_seq_len: int = 512,
72+
max_batch_size: int = 32,
73+
):
5874
local_rank, world_size = setup_model_parallel()
5975
if local_rank > 0:
60-
sys.stdout = open(os.devnull, 'w')
61-
62-
generator = load(ckpt_dir, tokenizer_path, local_rank, world_size)
63-
prompts = ["The capital of Germany is the city of", "Here is my sonnet in the style of Shakespeare about an artificial intelligence:"]
64-
results = generator.generate(prompts, max_gen_len=256, temperature=temperature, top_p=top_p)
76+
sys.stdout = open(os.devnull, "w")
77+
78+
generator = load(
79+
ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size
80+
)
81+
82+
prompts = [
83+
# For these prompts, the expected answer is the natural continuation of the prompt
84+
"I believe the meaning of life is",
85+
"Simply put, the theory of relativity states that ",
86+
"Building a website can be done in 10 simple steps:\n",
87+
# Few shot prompts: https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api
88+
"""Tweet: "I hate it when my phone battery dies."
89+
Sentiment: Negative
90+
###
91+
Tweet: "My day has been 👍"
92+
Sentiment: Positive
93+
###
94+
Tweet: "This is the link to the article"
95+
Sentiment: Neutral
96+
###
97+
Tweet: "This new music video was incredibile"
98+
Sentiment:""",
99+
"""Translate English to French:
100+
101+
sea otter => loutre de mer
102+
103+
peppermint => menthe poivrée
104+
105+
plush girafe => girafe peluche
106+
107+
cheese =>""",
108+
]
109+
results = generator.generate(
110+
prompts, max_gen_len=256, temperature=temperature, top_p=top_p
111+
)
65112

66113
for result in results:
67114
print(result)

0 commit comments

Comments
 (0)