Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the default of ngl be -1 #707

Merged
merged 1 commit into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/ramalama.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,8 @@ pass --group-add keep-groups to podman (default: False)
Needed to access the gpu on some systems, but has an impact on security, use with caution.

#### **--ngl**
number of gpu layers (default: 999)
number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1)
The default -1, means use whatever is automatically deemed appropriate (0 or 999)

#### **--nocontainer**
do not run RamaLama in the default container (default: False)
Expand Down
3 changes: 2 additions & 1 deletion docs/ramalama.conf
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,9 @@
#keep_groups = false

# Default number of layers offloaded to the gpu
# -1 means use whatever is automatically deemed appropriate (0 or 999)
#
#ngl = 999
#ngl = -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain -1 above


# Specify default port for services to listen on
#
Expand Down
5 changes: 3 additions & 2 deletions docs/ramalama.conf.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,9 +92,10 @@ RAMALAMA_IMAGE environment variable overrides this field.
Pass `--group-add keep-groups` to podman, when using podman.
In some cases this is needed to access the gpu from a rootless container

**ngl**=999
**ngl**=-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the meaning of -1.


Default number of layers to offload to the gpu
number of gpu layers, 0 means CPU inferencing, 999 means use max layers (default: -1)
The default -1, means use whatever is automatically deemed appropriate (0 or 999)

**port**="8080"

Expand Down
4 changes: 2 additions & 2 deletions ramalama/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,8 +196,8 @@ def configure_arguments(parser):
"--ngl",
dest="ngl",
type=int,
default=config.get("ngl", 999),
help="Number of layers to offload to the gpu, if available",
default=config.get("ngl", -1),
help="Number of layers to offload to the gpu, if available"
)
parser.add_argument(
"--keep-groups",
Expand Down
2 changes: 1 addition & 1 deletion ramalama/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ def setup_container(self, args):
def gpu_args(self, args, runner=False):
gpu_args = []
if (
args.gpu
args.gpu > 0
or os.getenv("HIP_VISIBLE_DEVICES")
or os.getenv("ASAHI_VISIBLE_DEVICES")
or os.getenv("CUDA_VISIBLE_DEVICES")
Expand Down
Loading