Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend #3734

Open
4 of 5 tasks
Bratzmeister opened this issue Nov 10, 2024 · 2 comments
Open
4 of 5 tasks
Labels
bug Something isn't working triage This needs an (initial) review

Comments

@Bratzmeister
Copy link

Bratzmeister commented Nov 10, 2024

Checklist

  • The issue has not been resolved by following the troubleshooting guide
  • The issue exists on a clean installation of Fooocus
  • The issue exists in the current version of Fooocus
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

when starting Fooocus with e.g. --gpu-device-id 1 it will state that it does that in the log but then uses device 0 anyways

Steps to reproduce the problem

  1. start entry_with_update.py or other launch option with parameter --gpu-device-id x
  2. read the startup log to see it's still always using cuda:0
  3. start inference and witness it actually using cuda:0 and not cuda:1

What should have happened?

Fooocus should use the device specified with --gpu-device-id

What browsers do you use to access Fooocus?

Mozilla Firefox, Google Chrome

Where are you running Fooocus?

Locally

What operating system are you using?

Gentoo Linux

Console logs

Fooocus:

Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--gpu-device-id', '1', '--disable-in-browser']
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614]
Fooocus version: 2.5.5
Set device to: 1
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 20464 MB, total RAM 96213 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 7900 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
Running on local URL:  http://127.0.0.1:7865

rocm-smi -i:

GPU[0]		: Device Name: 		Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M]
GPU[0]		: Device ID: 		0x744c
GPU[0]		: Device Rev: 		0xcc
GPU[0]		: Subsystem ID: 	NITRO+ RX 7900 XTX Vapor-X
GPU[0]		: GUID: 		56961
GPU[1]		: Device Name: 		Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M]
GPU[1]		: Device ID: 		0x744c
GPU[1]		: Device Rev: 		0xcc
GPU[1]		: Subsystem ID: 	0x5317
GPU[1]		: GUID: 		28574

Torch (confirm switching device is possible):

Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
t>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='AMD Radeon RX 7900 XT', major=11, minor=0, gcnArchName='gfx1100', total_memory=20464MB, multi_processor_count=42)
>>> torch.cuda.get_device_properties(1)
_CudaDeviceProperties(name='AMD Radeon RX 7900 XT', major=11, minor=0, gcnArchName='gfx1100', total_memory=20464MB, multi_processor_count=42)
>>> torch.cuda.set_device(1)
>>> torch.cuda.current_device()
1

Additional information

I have two discrete GPUs (device id 0 and 1), integrated graphics are disabled via bios and not exposed to the system or used in any way

@Bratzmeister Bratzmeister added bug Something isn't working triage This needs an (initial) review labels Nov 10, 2024
@Bratzmeister
Copy link
Author

okay this seems to be an AMD/HIP specific issue. I found the root cause to be the env variable set by the argument in

os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpu_device_id)

When using HIP/ROCm backend it needs to be HIP_VISIBLE_DEVICES instead of CUDA_VISIBLE_DEVICES.
proof here

(fooocus_env) axt@weilichskann ~/zeugs/AI/Fooocus $ CUDA_VISIBLE_DEVICES=1 python 
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device_count()
2
(fooocus_env) axt@weilichskann ~/zeugs/AI/Fooocus $ HIP_VISIBLE_DEVICES=1 python 
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device_count()
1

So I locally changed the code to reflect that but sadly it's causing a segfault.

Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--disable-in-browser', '--gpu-device-id', '1']
Python 3.10.15 (main, Sep 20 2024, 14:16:53) [GCC 13.3.1 20240614]
Fooocus version: 2.5.5
Set device to: 1
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 20464 MB, total RAM 96213 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 7900 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
IMPORTANT: You are using gradio version 3.41.2, however version 4.44.1 is available, please upgrade.
--------
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /mnt/MeerTor/models/image/checkpoints/SDXL 1.0/realism/sdxlYamersRealistic5_v5Rundiffusion.safetensors
VAE loaded: None
Request to load LoRAs [('sd_xl_offset_example-lora_1.0.safetensors', 0.1)] for model [/mnt/MeerTor/models/image/checkpoints/SDXL 1.0/realism/sdxlYamersRealistic5_v5Rundiffusion.safetensors].
Loaded LoRA [/mnt/MeerTor/models/image/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/mnt/MeerTor/models/image/checkpoints/SDXL 1.0/realism/sdxlYamersRealistic5_v5Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
./fooocuspocus.sh: line 6: 3734030 Segmentation fault      (core dumped) python3 entry_with_update.py --disable-in-browser --gpu-device-id 1

How to proceed from here? Any ideas concerning debugging this further? I'm not very versed in Fooocus and its development

@Bratzmeister Bratzmeister changed the title [Bug]: --gpu-device-id commandline argument doesn't work (or I'm interpreting it wrongly) [Bug]: --gpu-device-id commandline argument doesn't work with HIP/ROCm (AMD) backend Nov 11, 2024
@Bratzmeister
Copy link
Author

ComfyUI suffers from the same issue. comfyanonymous/ComfyUI#5585

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage This needs an (initial) review
Projects
None yet
Development

No branches or pull requests

1 participant