[Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend #3734

Bratzmeister · 2024-11-10T04:29:09Z

Checklist

The issue has not been resolved by following the troubleshooting guide
The issue exists on a clean installation of Fooocus
The issue exists in the current version of Fooocus
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

when starting Fooocus with e.g. --gpu-device-id 1 it will state that it does that in the log but then uses device 0 anyways

Steps to reproduce the problem

start entry_with_update.py or other launch option with parameter --gpu-device-id x
read the startup log to see it's still always using cuda:0
start inference and witness it actually using cuda:0 and not cuda:1

What should have happened?

Fooocus should use the device specified with --gpu-device-id

What browsers do you use to access Fooocus?

Mozilla Firefox, Google Chrome

Where are you running Fooocus?

Locally

What operating system are you using?

Gentoo Linux

Console logs

Fooocus:

Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--gpu-device-id', '1', '--disable-in-browser']
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614]
Fooocus version: 2.5.5
Set device to: 1
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 20464 MB, total RAM 96213 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 7900 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
Running on local URL:  http://127.0.0.1:7865

rocm-smi -i:

GPU[0]		: Device Name: 		Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M]
GPU[0]		: Device ID: 		0x744c
GPU[0]		: Device Rev: 		0xcc
GPU[0]		: Subsystem ID: 	NITRO+ RX 7900 XTX Vapor-X
GPU[0]		: GUID: 		56961
GPU[1]		: Device Name: 		Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M]
GPU[1]		: Device ID: 		0x744c
GPU[1]		: Device Rev: 		0xcc
GPU[1]		: Subsystem ID: 	0x5317
GPU[1]		: GUID: 		28574

Torch (confirm switching device is possible):

Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
t>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='AMD Radeon RX 7900 XT', major=11, minor=0, gcnArchName='gfx1100', total_memory=20464MB, multi_processor_count=42)
>>> torch.cuda.get_device_properties(1)
_CudaDeviceProperties(name='AMD Radeon RX 7900 XT', major=11, minor=0, gcnArchName='gfx1100', total_memory=20464MB, multi_processor_count=42)
>>> torch.cuda.set_device(1)
>>> torch.cuda.current_device()
1

Additional information

I have two discrete GPUs (device id 0 and 1), integrated graphics are disabled via bios and not exposed to the system or used in any way

The text was updated successfully, but these errors were encountered:

Bratzmeister · 2024-11-11T17:20:50Z

okay this seems to be an AMD/HIP specific issue. I found the root cause to be the env variable set by the argument in

Fooocus/launch.py

Line 80 in d7439b2

os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpu_device_id)

When using HIP/ROCm backend it needs to be HIP_VISIBLE_DEVICES instead of CUDA_VISIBLE_DEVICES.
proof here

(fooocus_env) axt@weilichskann ~/zeugs/AI/Fooocus $ CUDA_VISIBLE_DEVICES=1 python 
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device_count()
2

(fooocus_env) axt@weilichskann ~/zeugs/AI/Fooocus $ HIP_VISIBLE_DEVICES=1 python 
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device_count()
1

So I locally changed the code to reflect that but sadly it's causing a segfault.

Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--disable-in-browser', '--gpu-device-id', '1']
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614]
Fooocus version: 2.5.5
Set device to: 1
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 20464 MB, total RAM 96213 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 7900 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
IMPORTANT: You are using gradio version 3.41.2, however version 4.44.1 is available, please upgrade.
--------
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /mnt/MeerTor/models/image/checkpoints/SDXL 1.0/realism/sdxlYamersRealistic5_v5Rundiffusion.safetensors
VAE loaded: None
Request to load LoRAs [('sd_xl_offset_example-lora_1.0.safetensors', 0.1)] for model [/mnt/MeerTor/models/image/checkpoints/SDXL 1.0/realism/sdxlYamersRealistic5_v5Rundiffusion.safetensors].
Loaded LoRA [/mnt/MeerTor/models/image/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/mnt/MeerTor/models/image/checkpoints/SDXL 1.0/realism/sdxlYamersRealistic5_v5Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
./fooocuspocus.sh: line 6: 3734030 Segmentation fault      (core dumped) python3 entry_with_update.py --disable-in-browser --gpu-device-id 1

How to proceed from here? Any ideas concerning debugging this further? I'm not very versed in Fooocus and its development

Bratzmeister · 2024-11-11T19:51:28Z

ComfyUI suffers from the same issue. comfyanonymous/ComfyUI#5585

Bratzmeister added bug Something isn't working triage This needs an (initial) review labels Nov 10, 2024

Bratzmeister changed the title ~~[Bug]: --gpu-device-id commandline argument doesn't work (or I'm interpreting it wrongly)~~ [Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend #3734

[Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend #3734

Bratzmeister commented Nov 10, 2024 •

edited

Loading

Bratzmeister commented Nov 11, 2024

Bratzmeister commented Nov 11, 2024

[Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend #3734

[Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend #3734

Comments

Bratzmeister commented Nov 10, 2024 • edited Loading

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access Fooocus?

Where are you running Fooocus?

What operating system are you using?

Console logs

Additional information

Bratzmeister commented Nov 11, 2024

Bratzmeister commented Nov 11, 2024

Bratzmeister commented Nov 10, 2024 •

edited

Loading