You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llama_token token = llama_token_bos(&ctx->model); // not actually used by llama_build_graph, but required to choose between token and embedding inputs graph
Copy file name to clipboardExpand all lines: llama.cpp/main/main.1
+38Lines changed: 38 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -353,6 +353,44 @@ Force system to keep model in RAM rather than swapping or compressing.
353
353
Do not memory-map model (slower load but may reduce pageouts if not using mlock).
354
354
.ItFlFlnuma
355
355
Attempt optimizations that help on some NUMA systems if run without this previously, it is recommended to drop the system page cache before using this. See https://github.com/ggerganov/llama.cpp/issues/1437.
356
+
.ItFlFlnocompile
357
+
Never compile GPU support at runtime.
358
+
.Pp
359
+
If
360
+
.Pa~/.llamafile/ggml-cuda.dll
361
+
already exists on the file system (or .so for UNIX and .dylib for
362
+
MacOS), then it'll be linked as-is without question. Otherwise,
363
+
.Nm
364
+
will fall back to CPU inference.
365
+
.ItFlFlgpuArGPU
366
+
Specifies which brand of GPU should be used. Valid choices are:
367
+
.Pp
368
+
.Bl-dash
369
+
.It
370
+
.ArAUTO :
371
+
Use any GPU if possible, otherwise fall back to CPU inference (default)
372
+
.It
373
+
.ArAMD :
374
+
Use AMD GPU. The AMD ROCm SDK must be installed and the HIP_PATH
375
+
environment variable must be defined. If an AMD GPU could not be used
376
+
for any reason, then a fatal error will be raised.
377
+
.It
378
+
.ArAPPLE :
379
+
Use Apple Metal GPU. This is only available on MacOS ARM64. If Metal
380
+
could not be used for any reason, then a fatal error will be raised.
381
+
.It
382
+
.ArNVIDIA :
383
+
Use NVIDIA GPU. If an NVIDIA GPU could not be used for any reason, a
384
+
fatal error will be raised. On Windows, NVIDIA GPU support will use our
385
+
tinyBLAS library, since it works on stock Windows installs. If both MSVC
386
+
and CUDA are installed beforehand, and
387
+
.Nm
388
+
is run for the first time on the x64 command prompt, then llamafile will
389
+
use NVIDIA's faster cuBLAS library instead. On Linux and other systems,
390
+
the CUDA SDK must always be installed, so that native GPU support can be
0 commit comments