-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed gpu detection for cuda rocm etc using env vars #490
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThe PR enhances GPU detection and configuration by refactoring the code to use environment variables for GPU state management and adding CUDA support alongside existing ROCm and Asahi implementations. The changes improve GPU detection by performing a more comprehensive check across different GPU types and implementing a comparison mechanism to select the most suitable GPU when multiple types are available. Sequence diagram for GPU detection and configurationsequenceDiagram
participant User
participant System
participant GPU_Detector
participant Environment
User->>System: Run command with --gpu flag
System->>GPU_Detector: Call get_gpu()
GPU_Detector->>System: Return GPU type and number
System->>Environment: Set environment variable for GPU
System->>User: Run container with GPU support
Updated class diagram for GPU detectionclassDiagram
class Model {
+_image(args)
+setup_container(args)
+run_container(args, shortnames)
+gpu_args()
}
class GPU_Detector {
+get_gpu() : (String, Integer)
}
class Environment {
+os.getenv(var)
+os.environ[var] = value
}
Model --> GPU_Detector : uses
Model --> Environment : uses
note for GPU_Detector "Enhanced to detect CUDA, ROCm, and Asahi GPUs"
note for Environment "Uses environment variables for GPU state management"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bmahabirbu - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider moving the hardcoded docker.io image reference to a configuration file or using the project's official repositories instead of a personal Docker Hub account
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
ramalama/model.py
Outdated
elif nvidia_gpu_mib: | ||
return "CUDA_VISIBLE_DEVICES", nvidia_gpu_num | ||
|
||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Catching all exceptions masks potential important errors
Consider catching specific exceptions (like subprocess.CalledProcessError) and logging or handling them appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with soucery-ai here. We should specific errors, for example this will blow up if the command is not installed or has permission errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback, we will generate more comments like this in the future according to the following instructions:
- Include comments that identify potential bug risks, such as catching all exceptions, which can mask important errors.
- Suggest specific improvements, like catching specific exceptions and handling them appropriately.
- Ensure comments are clear and provide actionable advice.
ramalama/model.py
Outdated
@@ -384,25 +392,52 @@ | |||
|
|||
def get_gpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider restructuring get_gpu() into distinct phases with early returns and clearer control flow
The get_gpu()
function's complexity can be reduced by restructuring it with clear phases and early returns. Here's a simplified version that maintains all functionality:
def get_gpu():
# Check Asahi first - simplest case
if os.path.exists('/etc/os-release'):
with open('/etc/os-release', 'r') as file:
if "asahi" in file.read().lower():
return "ASAHI_VISIBLE_DEVICES", 1
# Check AMD GPUs
amd_gpu_num = -1
amd_gpu_mib = 0
for i, fp in enumerate(sorted(glob.glob('/sys/bus/pci/devices/*/mem_info_vram_total'))):
with open(fp, 'r') as file:
bytes = int(file.read())
if bytes > 1073741824 and bytes > (amd_gpu_mib * 1048576):
amd_gpu_mib = bytes / 1048576
amd_gpu_num = i
# Check NVIDIA GPUs
nvidia_gpu_num = -1
nvidia_gpu_mib = 0
try:
output = run_cmd(['nvidia-smi', '--query-gpu=index,memory.total', '--format=csv,noheader,nounits'])
gpus = output.stdout.strip().split('\n')
if gpus:
gpu = max(gpus, key=lambda x: int(x.split(',')[1]))
nvidia_gpu_mib = int(gpu.split(',')[1])
nvidia_gpu_num = int(gpu.split(',')[0])
except Exception:
pass
# Return the GPU with more memory
if amd_gpu_num >= 0 and nvidia_gpu_num >= 0:
return ("HIP_VISIBLE_DEVICES", amd_gpu_num) if amd_gpu_mib > nvidia_gpu_mib else ("CUDA_VISIBLE_DEVICES", nvidia_gpu_num)
if amd_gpu_num >= 0:
return "HIP_VISIBLE_DEVICES", amd_gpu_num
if nvidia_gpu_num >= 0:
return "CUDA_VISIBLE_DEVICES", nvidia_gpu_num
return None, None
This restructuring:
- Separates detection into clear phases
- Simplifies the control flow with early returns
- Makes the comparison logic more explicit
- Reduces nesting depth
- Maintains all existing functionality
a7a8309
to
69e8acb
Compare
@@ -190,14 +198,14 @@ def cleanup(): | |||
return True | |||
|
|||
def gpu_args(self): | |||
gpu_type, gpu_num = get_gpu() | |||
gpu_args = [] | |||
if sys.platform == "darwin": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be checking here that the app is not going to be run in a container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when this code was written we ran the python script both in and outside the container, so if you were on macOS at this point, you could assume you were actually gonna run on macOS native
I would do:
If Darwin:
["-ngl", "99"]
for all cases. On macOS llama.cpp ignores -ngl 99 as it turns on acceleration by default.
|
||
# Check for NVIDIA GPUs (CUDA case) | ||
try: | ||
command = ['nvidia-smi', '--query-gpu=index,memory.total', '--format=csv,noheader,nounits'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check if nvidia-smi is installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is installed on Linux and Windows with the drivers themselves. I'm not an expert, but I am under the impression it will be present everywhere modern CUDA is usable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok well as long as we catch the failure and don't segfault, I will be happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final calculation at the end seems overlay complex, if amd_gpu_bytes and nvidia_gpu_mib are set to zero intially. If both are less than 1G VRAM use CPU. Otherwise just use the one with more VRAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use Nvidia GPUs either with < 1G VRAM. When the VRAM is that small it's not really worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I'll rework the function It should be more modular to use other gpu backends (like vulkan, intel arc) down the line. I was too focused on the edge case if a system were to have both amd and nvidia GPU above 1gb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think technically we could only have the kernel module installed outside the container and nvidia-smi installed in the container, the only thing actually required outside of the container is kernelspace stuff, aka kernel modules.
But I think it's ok like this. The detection wont be perfect ever. There will always be corner cases where one must manually specify their container somehow, I was thinking the env vars could be useful for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I did add some more modularity for now.
I'm thinking down the road I can query the vulkan SDK for info for all gpu architectures and select from there
69e8acb
to
2d5c1a5
Compare
I also reverted run_cmd args text=True as it was breaking ramalama list functionality and appropriately updated get_gpu() |
Want to get @ericcurtin approval. |
I'm the FNG and don't want to tip any canoes, but could I entreat your good graces to entertain a couple of questions?
I'm not volunteering to rework the code, but is it worse to call get_gpu twice at the expense of a few microseconds than it is to saddle the user seeking "boring AI" via a project whose mission objective is to inspect "your system for GPU support, falling back to CPU support if no GPUs are present"? As a user, I would prefer to see command-line arguments removed instead of added. Like, maybe defaulting to a particular model and possibly even eliding the run command. If the Ramalama command-line is complex enough that I have to create novel aliases to run the scaffolding whose primary task (AFAICT) is to replace the need for novel aliases... maybe I'm missing something. I'll drop a link to this discussion, too: #478 (comment) , because it would not feel great to have this patch break my ability to use the software. Happy holidays, folks. BR, |
@FNGarvin I agree that the end user should not be dealing with unnecessary additional args. The initial implication of that design choice was to build off of an existing framework. Thank you for reeling back the intent for the design. It's a simple fix and has little to do with performance as get_gpu() still can be called once. Env vars are set with the data from get_gpu() for additional calls that need the info. Additionally, the idea to omit "run" is an interesting one. Something to consider as we move forward. Having args such as run and serve IMO makes people who use podman/docker feel at home with Ramalam. But it would be nice to have a default model. In that discussion I mentioned how to get podman to work with We could go back to having separate commands for docker and podman but I'd like to first test the ability to share the nvidia libraries to the container runtimes manually to consolidate the args and adhere to the idea that we can switch between docker and podman seamlessly. Happy holidays! |
234c4af
to
e60a429
Compare
Signed-off-by: Brian <[email protected]>
@bmahabirbu needs a rebase. @ericcurtin What should we do with this one? |
Not sure, was waiting until this build goes green and conflicts are fixed to look at this again. At this point it might be easier to rewrite this, copying out the bits we need. |
I have added gpu detection functionality for cuda and fixed configurations for all gpu's using llama-cli. While working on this I also fixed how get_gpu was called. Now only calling that function once and using env vars to save states.
I have added my containerfile in docker hub as default for cuda (for testing purposes). @rhatdan is this ok to have for now or should I remove it?
@FNGarvin thanks for getting things started!
Edit:
Added separate command args for GPU support using
conman_args += ["--gpus", "all"]
for docker vsconman_args += ["--device", "nvidia.com/gpu=all"]
for podman.Eric I know you're on vacation so please don't take a look at this until you're back!
Summary by Sourcery
Improve GPU detection and configuration by using environment variables for CUDA, ROCm, and Asahi Linux. Refactor the GPU detection logic to streamline the process and add a default container image for CUDA on Docker Hub for testing.
Bug Fixes:
Enhancements:
Deployment: