ROCm containers fail on multi-gpu AMD systems #525

abn · 2024-12-26T19:01:15Z

When attempting to run a model, the command fails.

$ ramalama --debug run granite-code
run_cmd:  podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_rDnJbnsIEU -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=1 --mount=type=bind,src=/home/abn/.local/share/ramalama/models/ollama/granite-code:latest,destination=/mnt/models/model.file,rw=false quay.io/ramalama/rocm:latest /bin/sh -c llama-cli -m /mnt/models/model.file --in-prefix '' --in-suffix '' -c 2048 --temp 0.8 -p 'You are a helpful assistant' -cnv
Error: Command '['podman', 'run', '--rm', '-i', '--label', 'RAMALAMA', '--security-opt=label=disable', '--name', 'ramalama_rDnJbnsIEU', '-t', '--device', '/dev/dri', '--device', '/dev/kfd', '-e', 'HIP_VISIBLE_DEVICES=1', '--mount=type=bind,src=/home/abn/.local/share/ramalama/models/ollama/granite-code:latest,destination=/mnt/models/model.file,rw=false', 'quay.io/ramalama/rocm:latest', '/bin/sh', '-c', "llama-cli -m /mnt/models/model.file --in-prefix '' --in-suffix '' -c 2048 --temp 0.8 -p 'You are a helpful assistant' -cnv"]' returned non-zero exit status 139.

Running the podman command gives the following.

$ podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_rDnJbnsIEU -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=1 --mount=type=bind,src=/home/abn/.local/share/ramalama/models/ollama/granite-code:latest,destination=/mnt/models/model.file,rw=false quay.io/ramalama/rocm:latest /bin/sh -c llama-cli -m /mnt/models/model.file --in-prefix '' --in-suffix '' -c 2048 --temp 0.8 -p 'You are a helpful assistant' -cnv

rocBLAS error: Cannot read /opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1103
 List of available TensileLibrary Files : 
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1012.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"

The error encountered in this case seems to be similar to that seen in #497. However, in my case the desired outcome was either that the right GPU be selected or the HSA_OVERRIDE_GFX_VERSION env var be set rather than forcing to run on cpu.

The root cause in my setup seems to be somehow related to the existence of multiple GPUs on the machine. Although, I am not certain.

$ rocminfo
...
==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 7840HS w/ Radeon 780M Graphics 
...
*******                  
Agent 2                  
*******                  
  Name:                    gfx1102                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 7600M XT        
...
*******                  
Agent 3                  
*******                  
  Name:                    gfx1103                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon 780M
...

Expected Outcome

When a model is executed on a multi-gpu AMD system, the correct HIP device (maybe a preferential order of some sort?) is selected and specified in the container. Alternatively, allowing for specifying the device via flag (--gpu <num>) or respecting existing environment variables would be sufficient (Fixed gpu detection for cuda rocm etc using env vars #490 might resolve this).
(Maybe?) When a model is executed on a HIP device whose GFX version differs to the highest available version on the system, specify HSA_OVERRIDE_GFX_VERSION when executing the container.

I am happy to contribute to code if a direction for the fix is provided.

Workarounds

In my local environment, I had to do one of the following to work around the issue. Both had to be done in the podman command as I could not figure out how to configure it via RamaLama.

Set HIP_VISIBLE_DEVICES=0 to detect AMD Radeon RX 7600M XT.
Set HSA_OVERRIDE_GFX_VERSION=11.0.2 so that it worked with HIP_VISIBLE_DEVICES=1 which selects the iGPU and is what RamaLama chooses to pass to podman.

The text was updated successfully, but these errors were encountered:

ericcurtin · 2024-12-27T14:01:43Z

Open to ideas, but by default I propose that RamaLama should just try and use the GPU with the most VRAM (and just a sole GPU). I think heuristics more complex than that are not worth it.

For multi-GPU etc. or any other non-default way of running models, there should be a way to set that up, either via flag, env var etc.

We should support using the various flags people are using with llama.cpp in the AI community like HIP_VISIBLE_DEVICES, HSA_OVERRIDE_GFX_VERSION, etc. No point in reinventing the wheel.

abn · 2024-12-28T00:35:32Z

From my perspective, I agree that defaulting to larger VRAM GPU is great as it is likely what most users would expect anyway.

Edit: Seems like this is already done in code, in my case seems I have around 8GB allocated to the APU for VRAM, and the GPU has 7.98GB - this caused RamaLama to choose the APU instead of the GPU.

That said, something I have not fully understood in my original error is why the command failed when the iGPU (HIP_VISIBLE_DEVICES=1) supported 11.0.3. It feels like this type of mismatch could be common for multi-gpu systems, like when using an AMD CPU and a dedicated AMD GPU.

Edit: Upon investigation this looks more like a ROCm/PyTorch issue.

And yes, it would be great if RamaLama simply passed in all HIP_* and HSA_* variables in defined in the environment to the container.

Edit: Proposed #526 for this change.

Relates-to: containers#525

Relates-to: containers#525 Signed-off-by: Arun Babu Neelicattu <[email protected]>

abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024

Pass ASAHI, CUDA, HIP, HSA prefixed env vars to container

60d37fa

Relates-to: containers#525

abn mentioned this issue Dec 28, 2024

Pass ASAHI, CUDA, HIP, HSA prefixed env vars to container #526

Merged

abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024

Pass ASAHI, CUDA, HIP, HSA prefixed env vars to container

13e5d4c

Relates-to: containers#525 Signed-off-by: Arun Babu Neelicattu <[email protected]>

abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024

Pass ASAHI, CUDA, HIP, HSA prefixed env vars to generated files

0aecb51

Relates-to: containers#525 Signed-off-by: Arun Babu Neelicattu <[email protected]>

abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024

Pass ASAHI, CUDA, HIP, HSA prefixed env vars to container

1853f40

Relates-to: containers#525 Signed-off-by: Arun Babu Neelicattu <[email protected]>

abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024

Pass ASAHI, CUDA, HIP, HSA prefixed env vars to generated files

1c250e6

Relates-to: containers#525 Signed-off-by: Arun Babu Neelicattu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm containers fail on multi-gpu AMD systems #525

ROCm containers fail on multi-gpu AMD systems #525

abn commented Dec 26, 2024 •

edited

Loading

ericcurtin commented Dec 27, 2024

abn commented Dec 28, 2024 •

edited

Loading

ROCm containers fail on multi-gpu AMD systems #525

ROCm containers fail on multi-gpu AMD systems #525

Comments

abn commented Dec 26, 2024 • edited Loading

Expected Outcome

Workarounds

ericcurtin commented Dec 27, 2024

abn commented Dec 28, 2024 • edited Loading

abn commented Dec 26, 2024 •

edited

Loading

abn commented Dec 28, 2024 •

edited

Loading