Skip to content

Commit 90f256c

Browse files
authored
Update perf_infer_gpu_one.md: fix a typo (huggingface#35441)
1 parent 5c75087 commit 90f256c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/en/perf_infer_gpu_one.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -462,7 +462,7 @@ generated_ids = model.generate(**inputs)
462462
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
463463
```
464464

465-
To load a model in 4-bit for inference with multiple GPUs, you can control how much GPU RAM you want to allocate to each GPU. For example, to distribute 1GB of memory to the first GPU and 2GB of memory to the second GPU:
465+
To load a model in 8-bit for inference with multiple GPUs, you can control how much GPU RAM you want to allocate to each GPU. For example, to distribute 1GB of memory to the first GPU and 2GB of memory to the second GPU:
466466

467467
```py
468468
max_memory_mapping = {0: "1GB", 1: "2GB"}

0 commit comments

Comments
 (0)